When testing iDEP, I and other users noticed that sometimes when using limma to identify differentially expressed genes, up-regulated genes are labelled as down-regulated, and vice versa. I digged into this a bit more and found an issue related to design matrices for limma. The limma documentation is not very clear on this, so I will document it here.

Reading the data

We will use part of the demo data for iDEP, available on GitHub.

library(edgeR)

## Loading required package: limma

library(limma)
library(knitr)
fileURL <- "https://raw.githubusercontent.com/iDEP-SDSU/idep/master/shinyapps/idep/BcellGSE71176_p53.csv" 

x <- read.csv(fileURL, row.names = 1)
x <- x[ , 1:8]
colnames(x) <- gsub("p53_","",colnames(x))
kable( head( x ) )

	mock_1	mock_2	mock_3	mock_4	IR_1	IR_2	IR_3	IR_4
ENSMUSG00000028995	3210	2553	2096	3111	1054	663	1012	1027
ENSMUSG00000045007	122	97	53	78	22	10	15	22
ENSMUSG00000062785	183	182	143	232	530	384	590	655
ENSMUSG00000038268	1117	843	669	783	526	369	654	765
ENSMUSG00000054589	85	46	36	45	693	394	865	1120
ENSMUSG00000102752	52	59	45	75	21	16	20	11
# Correct orientation

groups = gsub("_.*", "", colnames( x ))
groups <- factor(groups, levels = c("mock","IR") )
design <- model.matrix( ~ 0 + groups )
colnames(design) <- c("mock","IR")
design

##   mock IR
## 1    1  0
## 2    1  0
## 3    1  0
## 4    1  0
## 5    0  1
## 6    0  1
## 7    0  1
## 8    0  1
## attr(,"assign")
## [1] 1 1
## attr(,"contrasts")
## attr(,"contrasts")$groups
## [1] "contr.treatment"

dge <- DGEList(counts=x);
dge <- calcNormFactors(dge, method = "TMM")  # normalization
v <- voom(dge, design)
fit <- lmFit(v, design) 

cont.matrix <- makeContrasts(contrasts = "IR-mock", levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2, trend=FALSE)

# calls differential gene expression 1 for up, -1 for down
results <- decideTests(fit2, p.value = .1, lfc= log2(2) )
topGenes =topTable(fit2, number = 1e12,sort.by="M" )
head(topGenes)

##                       logFC    AveExpr         t      P.Value    adj.P.Val
## ENSMUSG00000026831 9.892108 0.68261871  6.389315 9.581983e-05 5.409615e-04
## ENSMUSG00000000308 8.584652 0.03622513  8.150802 1.296283e-05 1.148136e-04
## ENSMUSG00000041801 8.556836 4.89825205 14.668745 6.909906e-08 3.507048e-06
## ENSMUSG00000026700 8.436601 3.83021718 11.872409 4.763704e-07 1.127824e-05
## ENSMUSG00000037188 7.932190 1.99753040 17.198558 1.581832e-08 1.522821e-06
## ENSMUSG00000009628 7.877015 4.17154790 11.394278 6.898578e-07 1.447440e-05
##                           B
## ENSMUSG00000026831 1.239444
## ENSMUSG00000000308 2.533939
## ENSMUSG00000041801 8.619459
## ENSMUSG00000026700 6.654007
## ENSMUSG00000037188 7.715722
## ENSMUSG00000009628 6.501137

This is correct, as we can see from the most upregulated genes.

ix <- match(row.names(topGenes)[1], row.names(x) )
top1 <- ( as.vector( t( x[ix,]) ) )
names( top1 ) = colnames(x)
barplot( top1 , las=3)

Wrong orientation

If the levels in the factor does not align correctly with the column names of the design matrix, then we will get a incorrect design matrix which could lead to up-regulated genes are makred as down-regulated ones.

    groups = gsub("_.*", "", colnames( x ))
    groups <- factor(groups, levels = c("IR","mock") )
    design <- model.matrix( ~ 0 + groups )
    colnames(design) <- c("mock","IR")
  design

##   mock IR
## 1    0  1
## 2    0  1
## 3    0  1
## 4    0  1
## 5    1  0
## 6    1  0
## 7    1  0
## 8    1  0
## attr(,"assign")
## [1] 1 1
## attr(,"contrasts")
## attr(,"contrasts")$groups
## [1] "contr.treatment"

dge <- DGEList(counts=x);
dge <- calcNormFactors(dge, method = "TMM")  # normalization
v <- voom(dge, design)
fit <- lmFit(v, design) 

cont.matrix <- makeContrasts(contrasts = "IR-mock", levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2, trend=FALSE)

# calls differential gene expression 1 for up, -1 for down
results <- decideTests(fit2, p.value = .1, lfc= log2(2) )
topGenes =topTable(fit2, number = 1e12,sort.by="M" )
head(topGenes)

##                        logFC    AveExpr          t      P.Value
## ENSMUSG00000026831 -9.892108 0.68261871  -6.389315 9.581983e-05
## ENSMUSG00000000308 -8.584652 0.03622513  -8.150802 1.296283e-05
## ENSMUSG00000041801 -8.556836 4.89825205 -14.668745 6.909906e-08
## ENSMUSG00000026700 -8.436601 3.83021718 -11.872409 4.763704e-07
## ENSMUSG00000037188 -7.932190 1.99753040 -17.198558 1.581832e-08
## ENSMUSG00000009628 -7.877015 4.17154790 -11.394278 6.898578e-07
##                       adj.P.Val        B
## ENSMUSG00000026831 5.409615e-04 1.239444
## ENSMUSG00000000308 1.148136e-04 2.533939
## ENSMUSG00000041801 3.507048e-06 8.619459
## ENSMUSG00000026700 1.127824e-05 6.654007
## ENSMUSG00000037188 1.522821e-06 7.715722
## ENSMUSG00000009628 1.447440e-05 6.501137

This also works

The “levels =” parameters sets the reference levels and the order of the levels. This is used in creating the design matrix. So as long as the orders are the same, we will get correct answers. Notice now that it is “IR” is the reference level for groups, and the column names are reversed accordingly. This will give us correct results.

    groups = gsub("_.*", "", colnames( x ))
    groups <- factor(groups, levels = c("IR","mock") )
    design <- model.matrix( ~ 0 + groups )
    colnames(design) <- c("IR","mock")
  design

##   IR mock
## 1  0    1
## 2  0    1
## 3  0    1
## 4  0    1
## 5  1    0
## 6  1    0
## 7  1    0
## 8  1    0
## attr(,"assign")
## [1] 1 1
## attr(,"contrasts")
## attr(,"contrasts")$groups
## [1] "contr.treatment"

A more straightforward method

Perhaps a more straightforward way is not setting the levels and column names directly, but use whatever was generated.

    groups = gsub("_.*", "", colnames( x ))
    groups <- factor(groups )
    design <- model.matrix( ~ 0 + groups )
    design

##   groupsIR groupsmock
## 1        0          1
## 2        0          1
## 3        0          1
## 4        0          1
## 5        1          0
## 6        1          0
## 7        1          0
## 8        1          0
## attr(,"assign")
## [1] 1 1
## attr(,"contrasts")
## attr(,"contrasts")$groups
## [1] "contr.treatment"

Then, the only thing we need to do is to remove the “groups” from the column names.

    colnames(design) <- gsub("^groups","",colnames(design))
    design

##   IR mock
## 1  0    1
## 2  0    1
## 3  0    1
## 4  0    1
## 5  1    0
## 6  1    0
## 7  1    0
## 8  1    0
## attr(,"assign")
## [1] 1 1
## attr(,"contrasts")
## attr(,"contrasts")$groups
## [1] "contr.treatment"

This works because both the levels and the colnames are auto-generated. We just have to remove the name of the factor.

The new iDEP v 0.82, this issue has been resolved. If you used iDEP, please check your results carefully if you used limma to analyze two groups (control vs. experiment). As the factor levels are assigned as reference level based on alphbetical order, there is a 50% chance that the up- and down-regulated genes are reversed. If you have multiple groups, it is does not affect your results. If you used DESeq2, it should be fine.

Creating design matrices for differential gene expression using limma

Steven Ge http://ge-lab.org

March 5, 2019

Reading the data

Wrong orientation

This also works

A more straightforward method