In the original DEGage publication (1), DEGage was compared against multiple popular differential expression analysis tools and tests. This vignette details the general workflow used for these comparisions with a dummy example on simulated data.
First, all of the appropriate libraries must be loaded in:
library(DEGage)
library(DESeq2)
library(edgeR)
library(DEsingle)
library(scDD)
library(monocle3)
library(MASS)
library(dplyr)
library(SingleCellExperiment)
library(ggplot2)
library(ggpubr)
For the purposes of this vignette, the packages will be compared on a
small data set generated with DEGage_Simulation(). This
function simulates counts along a negative binomial simulation and
simulates differential expression by introducing a log2 fold change
difference between the means of two sets of counts. The means,
dispersions, and dropout proportions can be generated automatically by
the simulation framework; however, in this case, we will control the
dispersions, dropout proportions, and log2 fold change differences.
ngenes = 800
ndegs = 200
ngroup1 = 20
ngroup2 = 20
lfcs = runif(ndegs, min = 2.5, max = 7.5)
dispersions = runif(ndegs + ngenes, min = 0.1, max = 10)
means = abs(rnorm(ndegs + ngenes, mean = 50, sd = 20))
counts <- DEGage_Simulation(ngenes = ngenes,
ndegs = ndegs,
ngroup1 = ngroup1,
ngroup2 = ngroup2,
lfc = lfcs,
dispersions = dispersions,
means = means ,
max.prop.zeros = 0.4,
min.prop.zeros = 0.2)
head(counts)
#> Cell1.Group1 Cell2.Group1 Cell3.Group1 Cell4.Group1 Cell5.Group1
#> DEG1 73 0 22 15 8
#> DEG2 0 86 22 0 145
#> DEG3 25 112 26 63 75
#> DEG4 0 27 7 0 18
#> DEG5 0 22 0 193 17
#> DEG6 16 9 21 4 17
#> Cell6.Group1 Cell7.Group1 Cell8.Group1 Cell9.Group1 Cell10.Group1
#> DEG1 0 121 79 0 0
#> DEG2 125 111 38 142 0
#> DEG3 59 79 34 0 30
#> DEG4 5 24 0 0 34
#> DEG5 17 65 0 1 0
#> DEG6 0 0 10 0 29
#> Cell11.Group1 Cell12.Group1 Cell13.Group1 Cell14.Group1 Cell15.Group1
#> DEG1 5 0 41 19 3
#> DEG2 0 0 24 0 168
#> DEG3 31 32 0 0 89
#> DEG4 8 11 31 22 29
#> DEG5 16 45 0 0 25
#> DEG6 16 0 0 13 26
#> Cell16.Group1 Cell17.Group1 Cell18.Group1 Cell19.Group1 Cell20.Group1
#> DEG1 52 27 61 0 46
#> DEG2 7 15 125 0 12
#> DEG3 0 51 0 55 0
#> DEG4 15 33 17 17 23
#> DEG5 0 0 6 33 0
#> DEG6 7 0 21 24 8
#> Cell21.Group2 Cell22.Group2 Cell23.Group2 Cell24.Group2 Cell25.Group2
#> DEG1 6765 0 339 0 887
#> DEG2 0 0 534 609 103
#> DEG3 944 0 616 2097 0
#> DEG4 1157 1431 0 0 1942
#> DEG5 7664 137 0 6 4
#> DEG6 351 0 0 662 414
#> Cell26.Group2 Cell27.Group2 Cell28.Group2 Cell29.Group2 Cell30.Group2
#> DEG1 0 0 12383 3587 1836
#> DEG2 234 252 1675 102 974
#> DEG3 2070 3185 1210 0 2187
#> DEG4 1565 1279 1266 1624 0
#> DEG5 478 0 4611 0 0
#> DEG6 506 715 0 787 778
#> Cell31.Group2 Cell32.Group2 Cell33.Group2 Cell34.Group2 Cell35.Group2
#> DEG1 0 3631 1988 2865 5004
#> DEG2 265 0 305 0 1294
#> DEG3 1463 536 0 1318 2597
#> DEG4 0 1278 2004 0 0
#> DEG5 90 0 121 1 24
#> DEG6 653 466 937 0 0
#> Cell36.Group2 Cell37.Group2 Cell38.Group2 Cell39.Group2 Cell40.Group2
#> DEG1 0 5279 0 12512 10388
#> DEG2 0 382 0 0 493
#> DEG3 1988 1302 3000 1584 6359
#> DEG4 5490 0 1337 2391 1642
#> DEG5 0 4047 2068 0 7
#> DEG6 578 949 714 0 0
In this section, ## DEGage DEGage detects pairwise differential
expression by using the difference of two negative binomial (DOTNB)
distribution. It uses a generalized linear model (glm) to estimate the r
and p parameters of the count distribution for two conditions
independently. Then, it tests for differential expression using the
closed form of the DOTNB cumulative density function. DEGage only has
two required inputs. The first is a data.frame containing
counts, where columns represent cells and rows represent genes. The
second is is a factor containing a vector of counts that
indicates which cell belongs to which group. The entire DEGage pipeline
is encapsulated within a single command, DEGage().
groups <- factor(c(rep(1, ngroup1), rep(2, ngroup2)))
start.time <- Sys.time()
degage.results <- DEGage(counts, groups, perm.pval = 0.05)
degage.runtime <- difftime(Sys.time(), start.time, units = "secs")
DEGage outputs the estimated parameters for both gene wise distributions, pvalues,and FDRs for each gene:
head(degage.results)
#> r1 p1 mu1 r2 p2 mu2 base.mean
#> DEG1 3.371625e-01 0.01165154 28.60 1.257453e+07 0.9997318154 3373.2 1700.900
#> DEG2 4.315575e+05 0.99988184 51.00 2.095497e+06 0.9998277078 361.1 206.050
#> DEG3 5.820398e+05 0.99993463 38.05 7.479521e+06 0.9997830813 1622.8 830.425
#> DEG4 2.246844e+05 0.99992857 16.05 6.385293e+06 0.9998089255 1220.3 618.175
#> DEG5 1.846777e-01 0.00832456 22.00 1.137744e-01 0.0001181441 962.9 492.450
#> DEG6 1.709727e+05 0.99993537 11.05 2.712526e+06 0.9998431597 425.5 218.275
#> z.r1 z.p1 z.r2 z.p2 k permPvals pval FDR
#> DEG1 1.2702313 0.04252499 1.4123380 0.0004185186 3344.60 0.0000 0 0
#> DEG2 1.2422584 0.02377880 1.6599843 0.0045759851 310.10 0.0000 0 0
#> DEG3 4.9968721 0.11607979 2.9068931 0.0017880795 1584.75 0.0000 0 0
#> DEG4 5.1942560 0.24450167 5.1525280 0.0042045921 1204.25 0.0000 0 0
#> DEG5 0.7245848 0.03188550 0.1138352 0.0001182073 940.90 0.0175 0 0
#> DEG6 5.5352746 0.33374633 12.6289489 0.0288247305 414.45 0.0000 0 0
DESeq2 (2) fits negative binomial parameters for each gene using a glm, then, it uses empirical Bayes shrinkage to moderate gene wise dispersions towards a curve fit between the means and dispersions of all genes in a data set. Then, it uses a Wald test to determine significant differential expression. For a more detailed account of DESeq2 usage, please see this Bioconductor tutorial
dim.matrix <- matrix(c(rep("Group1", ngroup1), rep("Group2", ngroup2)),
nrow = ngroup1+ngroup2,
dimnames = list(colnames(counts), 'Group'))
start.time <- Sys.time()
DESeqobj <- DESeqDataSetFromMatrix(countData = counts+1,
colData = dim.matrix,
design = ~Group)
DESeq2.output <-DESeq(DESeqobj)
DESeq2.runtime <- difftime(Sys.time(), start.time, units = "secs")
DESeq2.results <- as.data.frame(DESeq2::results(DESeq2.output))
DESeq2 outputs shrunken log2 fold changes, alogside Wald Test p values and adjusted p values:
head(DESeq2.results)
#> baseMean log2FoldChange lfcSE stat pvalue padj
#> DEG1 708.44299 6.234256 2.6244037 2.375494 1.752546e-02 3.891683e-02
#> DEG2 90.16945 2.233713 1.2044864 1.854494 6.366853e-02 1.124438e-01
#> DEG3 348.22252 4.772169 1.2049549 3.960454 7.480727e-05 4.107599e-04
#> DEG4 261.73097 5.568398 1.2311384 4.522966 6.097898e-06 5.991403e-05
#> DEG5 37.25168 2.914754 0.8342316 3.493939 4.759496e-04 1.819453e-03
#> DEG6 93.06515 4.553237 1.1951291 3.809829 1.390629e-04 6.666191e-04
edgeR (3) uses maxiumum likelihood to fit negative binomial parameters, then uses a similar empirical Bayes procedure to DESeq2 to shrink dispersions towards a consensus estimate. It then uses a test analagous to a Fishers Exact test to determine differential expression. For extended edgeR documentation, see the edgeR user guide
dim.matrix <- matrix(c(rep("Group1", ngroup1), rep("Group2", ngroup2)),
nrow = ngroup1+ngroup2,
dimnames = list(colnames(counts), 'Group'))
dim.matrix <- factor(dim.matrix)
list <-DGEList(counts)
design <- model.matrix(~0+dim.matrix)
colnames(design) <- levels (dim.matrix)
start.time <- Sys.time()
AveLogCPM <-aveLogCPM(list)
list <- calcNormFactors(list)
list <- estimateDisp(list, design, Robust = TRUE)
fit <-glmQLFit(list, design, robust = TRUE)
onev.two <-makeContrasts(Group1-Group2, levels = design)
res <- glmQLFTest(fit, contrast = onev.two)
edgeR.runtime <- difftime(Sys.time(), start.time, units = "secs")
edgeR.results <- as.data.frame(topTags(res, n =nrow(counts) ,adjust.method = "fdr",p.value = 0.05))
Similarly to DESeq2, edgeR provides shrunken log2 fold change estimates, an F statistic, a p value, and an adjusted p value for each gene:
head(edgeR.results)
#> logFC logCPM F PValue FDR
#> DEG60 -8.093202 15.287828 34.08219 5.326128e-09 5.326128e-06
#> DEG93 -7.392986 14.412392 30.90241 2.731528e-08 8.810957e-06
#> DEG103 -5.444523 7.949437 30.78969 2.894754e-08 8.810957e-06
#> DEG79 -7.296564 14.782233 30.07230 4.188814e-08 8.810957e-06
#> DEG102 -7.217054 14.327245 29.97442 4.405479e-08 8.810957e-06
#> DEG10 -7.066591 14.465988 29.02354 7.192821e-08 9.374508e-06
DESingle (4) uses maximum likelihood estimation to fit gene wise parameters along a zero inflated negative binomial (ZINB) distribution. Then, it uses a likelihood ratio test to determine differential expression. By using information provided by the third dispersion parameter for the ZINB distrbution, DESingle classifies DE genes into three categories: differential expression (DE), differential expression abundance (DEa), and general differential expression (DEg). For more information on DESingle, see its documentation.
groups <- factor(c(rep(1, ngroup1), rep(2, ngroup2)))
start.time <- Sys.time()
DEsingle.output <- DEsingle(counts, groups)
DESingle.runtime <- difftime(Sys.time(), start.time, units = "secs")
DEsingle.results <- DEtype(DEsingle.output, threshold = 0.05)
DEsingle returns gene parameters, p values, adjusted pvalues, DE classifications, and information as to whether a gene was up or down regulated:
head(DEsingle.results)
#> theta_1 theta_2 mu_1 mu_2 size_1 size_2 prob_1
#> DEG17 0.05000003 0.3500000 41.52632 5369.922 12.460236 12.196720 0.23080259
#> DEG198 0.20000033 0.2000001 100.43752 8660.375 6.896802 14.102844 0.06425533
#> DEG131 0.34999951 0.2499999 64.15386 6287.000 10.589687 25.960261 0.14168027
#> DEG122 0.19999987 0.1999996 60.99997 5263.501 12.274866 7.484484 0.16751816
#> DEG79 0.40000000 0.2000000 64.50000 7213.812 44.870512 7.039638 0.41026152
#> DEG107 0.30000002 0.0999999 47.71428 2176.444 8.457842 12.442159 0.15057010
#> prob_2 total_mean_1 total_mean_2 foldChange norm_total_mean_1
#> DEG17 0.0022661559 35.25 4230.95 0.008331462 39.45
#> DEG198 0.0016257858 72.30 8339.70 0.008669377 80.35
#> DEG131 0.0041122168 37.70 5691.45 0.006623971 41.70
#> DEG122 0.0014199403 43.75 5085.70 0.008602552 48.80
#> DEG79 0.0009749041 34.75 7034.80 0.004939728 38.70
#> DEG107 0.0056842423 29.95 2374.95 0.012610792 33.40
#> norm_total_mean_2 norm_foldChange chi2LR1 pvalue_LR2 pvalue_LR3
#> DEG17 3490.45 0.011302268 135.8347 8.881784e-16 0
#> DEG198 6928.30 0.011597362 119.2508 5.413447e-13 0
#> DEG131 4715.25 0.008843646 118.1815 2.675637e-14 0
#> DEG122 4210.80 0.011589247 115.9770 1.891598e-12 0
#> DEG79 5771.05 0.006705885 112.2888 3.302913e-13 0
#> DEG107 1958.80 0.017051256 110.5063 4.674039e-13 0
#> FDR_LR2 FDR_LR3 pvalue pvalue.adj.FDR Remark Type State
#> DEG17 5.861978e-14 0 0 0 NA DEg down
#> DEG198 1.128276e-11 0 0 0 NA DEg down
#> DEG131 1.059552e-12 0 0 0 NA DEg down
#> DEG122 2.496909e-11 0 0 0 NA DEg down
#> DEG79 8.174711e-12 0 0 0 NA DEg down
#> DEG107 1.028289e-11 0 0 0 NA DEg down
This monocle3 (5) workflow fits a generalized
linear model for each gene, and then it uses a Wald Test to determine
differential expression. monocle3 provides multiple distribution
families to fit the counts to; however, a negative binomial glm was
chosen in this case, given that DEGage_Simulation()
simulates counts along an NB distribution. See the monocle3
differential expression analysis tutorial for more information.
cell_metadata = data.frame(Group = c(rep(1,ngroup1), rep(2,ngroup2)),
row.names = colnames(counts))
gene_metadata = data.frame(gene_short_name = rownames(counts),
row.names = rownames(counts))
start.time <- Sys.time()
cds <- new_cell_data_set(data.matrix(counts),
cell_metadata = cell_metadata,
gene_metadata = gene_metadata)
gene_fits <- fit_models(cds,
model_formula_str = "~Group",
expression_family = "negbinomial")
fit_coefs <- coefficient_table(gene_fits)
monocle3.runtime <- difftime(Sys.time(), start.time, units = "secs")
fit_coefs
#> # A tibble: 2,998 × 14
#> gene_short_name num_cells_expressed gene_id model model_summary status
#> <chr> <int> <chr> <named list> <named list> <chr>
#> 1 DEG1 27 DEG1 <negbin> <smmry.ng> OK
#> 2 DEG1 27 DEG1 <negbin> <smmry.ng> OK
#> 3 DEG1 27 DEG1 <negbin> <smmry.ng> OK
#> 4 DEG2 26 DEG2 <negbin> <smmry.ng> OK
#> 5 DEG2 26 DEG2 <negbin> <smmry.ng> OK
#> 6 DEG2 26 DEG2 <negbin> <smmry.ng> OK
#> 7 DEG3 30 DEG3 <negbin> <smmry.ng> OK
#> 8 DEG3 30 DEG3 <negbin> <smmry.ng> OK
#> 9 DEG3 30 DEG3 <negbin> <smmry.ng> OK
#> 10 DEG4 29 DEG4 <negbin> <smmry.ng> OK
#> # ℹ 2,988 more rows
#> # ℹ 8 more variables: term <chr>, estimate <dbl>, std_err <dbl>,
#> # test_val <dbl>, p_value <dbl>, normalized_effect <dbl>,
#> # model_component <chr>, q_value <dbl>
fit_coefs contains parameters and p values for each gene. For simplicity, we will select the gene names and the p values from fit_coefs:
intermediate <- fit_coefs %>% filter(term == "Group")
monocle.results <-intermediate %>%
filter (p_value < 0.05) %>%
select(gene_short_name,p_value)
monocle.results <- monocle.results %>%
distinct(monocle.results$gene_short_name, .keep_all = TRUE)
monocle.results <- as.data.frame(monocle.results)
rownames(monocle.results) <- monocle.results$gene_short_name
monocle.results <- monocle.results[,c(1,2)]
monocle.results$p.adj <- p.adjust(monocle.results$p_value, method = "fdr")
head(monocle.results)
#> gene_short_name p_value p.adj
#> DEG1 DEG1 1.450275e-274 1.823203e-273
#> DEG2 DEG2 0.000000e+00 0.000000e+00
#> DEG3 DEG3 4.715311e-223 5.007986e-222
#> DEG4 DEG4 3.000227e-39 1.400106e-38
#> DEG6 DEG6 1.502381e-57 8.263096e-57
#> DEG7 DEG7 5.392723e-136 4.678757e-135
scDD (6) models genes using a Dirichlet process mixture. It classifies genes into four categories, DE (differential expression), DP (differential proportions), DM (differential modalities), and DB (combined DP + DM). A vignette detailing scDD usage is available here.
sce <- SingleCellExperiment(assays = list(counts = counts))
colData(sce)$condition = c(rep("Group1", ngroup1), rep("Group2", ngroup2))
start.time <- Sys.time()
sce <- preprocess(sce, scran_norm = TRUE) #counts must be normalized
scDD.output <- scDD(sce)
scDD.runtime <- difftime(Sys.time(), start.time, units = "secs")
scDD.results <- as.data.frame(scDD::results(scDD.output))
scDD returns a categorization and p values for each gene:
head(scDD.results)
#> gene DDcategory Clusters.combined Clusters.c1 Clusters.c2 nonzero.pvalue
#> DEG1 DEG1 DE 2 1 1 2.793168e-06
#> DEG2 DEG2 NS 1 2 2 1.814473e-04
#> DEG3 DEG3 DB 2 2 1 6.539819e-07
#> DEG4 DEG4 DE 2 1 1 1.178014e-06
#> DEG5 DEG5 NS 2 1 1 7.540524e-02
#> DEG6 DEG6 DB 2 1 2 2.793168e-06
#> nonzero.pvalue.adj zero.pvalue zero.pvalue.adj combined.pvalue
#> DEG1 2.513851e-05 0.7529235 0.9059148 2.959426e-05
#> DEG2 1.047780e-03 0.9269833 0.9997521 1.629906e-03
#> DEG3 1.569115e-05 0.5009034 0.8971508 5.218878e-06
#> DEG4 1.569115e-05 0.3184375 0.8777657 5.925461e-06
#> DEG5 1.878550e-01 0.5487415 0.8978808 1.731671e-01
#> DEG6 2.513851e-05 0.7546050 0.9059148 2.965565e-05
#> combined.pvalue.adj
#> DEG1 2.314749e-04
#> DEG2 8.955529e-03
#> DEG3 9.759125e-05
#> DEG4 9.759125e-05
#> DEG5 3.675615e-01
#> DEG6 2.314749e-04
Although it’s not a specific scRNA-seq analysis framework, Wilcoxon tests are frequently used to test for differential expression, and are the default test used in large frameworks like Scanpy. The Wilcoxon rank sum test and non parametric and operates under the null hypothesis that two sets of counts follow a continuous distribution.
groups <- c(rep(1, ngroup1), rep(2, ngroup2))
calc.wilcox <- function(genewise.counts, groups){
g1 <- as.numeric(genewise.counts[groups == 1])
g2 <- as.numeric(genewise.counts[groups == 2])
t <-wilcox.test(g1, g2)
return(t$p.value)
}
start.time <- Sys.time()
wilcox.output <- apply(X = counts, FUN = calc.wilcox, MARGIN = 1,groups = groups)
wilcox.runtime <- difftime(Sys.time(), start.time, units = "secs")
wilcox.results <- data.frame(
pval = as.numeric(wilcox.output),
p.adj = p.adjust(as.numeric(wilcox.output), method = "fdr"),
row.names = names(wilcox.output)
)
head(wilcox.results)
#> pval p.adj
#> DEG1 0.0267127692 0.17808513
#> DEG2 0.0465195484 0.25403024
#> DEG3 0.0003371563 0.01053614
#> DEG4 0.0445247452 0.24874159
#> DEG5 0.2185964834 0.59563075
#> DEG6 0.0266980222 0.17808513
In addition to using a wilcoxon test, Petrany et. al also tested the
performance of the glm.nb function from the
MASS package. Some of the most popular DE analysis tools
use a glm to estimate NB parameters, and then perform some form of
post-processing on the parameters, such as the Bayes shrinkage strategy
used by DESeq2. The following code runs a NB glm, and then performs a
likelihood ratio test to determine differential expression. This tests
the capabilites of glm’s in DE testing alone, without the bells and
whistles of other DE analysis tools.
groups <- c(rep(1, ngroup1), rep(2, ngroup2))
calc.glm <- function(genewise.counts, groups){
design <- data.frame(counts = genewise.counts,
group = groups)
res <- tryCatch({res<-glm.nb(counts~group, design)},
error = function(e){return(NA)})
if(length(res) == 1){
p <- NA
}else{
lrt.res <- anova(res)
p <- lrt.res$`Pr(>Chi)`[2]
}
return(p)
}
start.time <- Sys.time()
glm.output <- apply(X = counts,
FUN = calc.glm,
MARGIN = 1,
groups = groups)
glm.runtime <- difftime(Sys.time(), start.time, units = "secs")
glm.results <- data.frame(
pval = as.numeric(glm.output),
p.adj = p.adjust(as.numeric(glm.output), method = "fdr"),
row.names = names(glm.output)
)
head(glm.results)
#> pval p.adj
#> DEG1 0.0000000000 0.0000000000
#> DEG2 0.0000000000 0.0000000000
#> DEG3 0.0000000000 0.0000000000
#> DEG4 0.0000000000 0.0000000000
#> DEG5 0.0002676476 0.0005217301
#> DEG6 0.0000000000 0.0000000000
Next, we will perform a brief comparison of all the methods detailed
above. First, we will ensure all of the dataframes containing the
results of each package have a column called p.adj, if they
don’t already.
colnames(degage.results)[15] <- "p.adj"
colnames(DESeq2.results)[6] <- "p.adj"
colnames(edgeR.results)[5] <- "p.adj"
colnames(DEsingle.results)[21] <- "p.adj"
colnames(scDD.results)[11] <- "p.adj"
Next, we will obtain basic performace statistics for each framework:
get_performance <- function(results, counts, packagename){
results<- results[!is.na(results$p.adj),]
expDEs<- rownames(results[results$p.adj <= 0.05,])
trueDEs <- rownames(counts[1:ndegs,])
trueEEs <- rownames(counts[ndegs + 1:nrow(counts),])
tp <- sum(trueDEs %in% expDEs)
fp <- sum(trueEEs %in% expDEs)
tn <- sum(!(trueEEs %in% expDEs))
fn <- sum(!(trueDEs %in% expDEs))
sensitivity <- tp/(tp+fn)
specificity <- tn/(tn+fp)
f1 <- (2*tp)/(2*tp+fp+fn)
performance <- data.frame("Package" = packagename,
"ndegs"= length(expDEs),
"Sensitivity" = sensitivity,
"Specificity" = specificity,
"F1" = f1)
return(performance)
}
performance <- rbind(get_performance(degage.results, counts, "DEGage"),
get_performance(DESeq2.results, counts, "DESeq2"),
get_performance(edgeR.results, counts, "edgeR"),
get_performance(DEsingle.results, counts, "DEsingle"),
get_performance(monocle.results, counts, "monocle3"),
get_performance(scDD.results, counts, "scDD"),
get_performance(wilcox.results, counts, "Wilcoxon"),
get_performance(glm.results, counts, "glm.nb"))
performance$Runtime <- as.numeric(c(degage.runtime,
DESeq2.runtime,
edgeR.runtime,
DESingle.runtime,
monocle3.runtime,
scDD.runtime,
wilcox.runtime,
glm.runtime))
performance
#> Package ndegs Sensitivity Specificity F1 Runtime
#> 1 DEGage 219 0.975 0.976 0.9307876 4.4603353
#> 2 DESeq2 143 0.705 0.998 0.8221574 4.5341818
#> 3 edgeR 179 0.875 0.996 0.9234828 0.2377856
#> 4 DEsingle 310 0.970 0.884 0.7607843 81.9927435
#> 5 monocle3 616 0.850 0.554 0.4166667 4.0095725
#> 6 scDD 225 0.920 0.959 0.8658824 83.0661187
#> 7 Wilcoxon 92 0.445 0.997 0.6095890 0.2798769
#> 8 glm.nb 640 0.975 0.555 0.4642857 4.6806390
As is seen above, DEGage provides one of the best trade offs between performance and run time. Note that this is not meant to be a representative comparative analysis of these methods, and is purely for illustrative purposes. For a more robust comparison, see (1)
(1) Petrany A., Chen R., Zhang S. and Chen, Y.
“Theoretical Framework for the Difference of Two Negative Binomial
Distributions and its Application in Comparative Analysis of Sequencing
Data”. Genome Research, Under Review.
(2) Love, M.I., Huber, W. & Anders, S.
Moderated estimation of fold change and dispersion for RNA-seq data with
DESeq2. Genome Biol 15, 550 (2014). https://doi.org/10.1186/s13059-014-0550-8
(3) Robinson MD, McCarthy DJ, Smyth GK. edgeR: a
Bioconductor package for differential expression analysis of digital
gene expression data. Bioinformatics. 2010 Jan 1;26(1):139-40. doi:
10.1093/bioinformatics/btp616. Epub 2009 Nov 11. PMID: 19910308; PMCID:
PMC2796818.
(4) Zhun Miao, Ke Deng, Xiaowo Wang, Xuegong
Zhang, DEsingle for detecting three types of differential expression in
single-cell RNA-seq data, Bioinformatics, Volume 34, Issue 18, September
2018, Pages 3223–3224, https://doi.org/10.1093/bioinformatics/bty332
(5) Trapnell C, Cacchiarelli D, Grimsby J,
Pokharel P, Li S, Morse M, Lennon NJ, Livak KJ, Mikkelsen TS, Rinn JL.
The dynamics and regulators of cell fate decisions are revealed by
pseudotemporal ordering of single cells. Nat Biotechnol. 2014
Apr;32(4):381-386. doi: 10.1038/nbt.2859. Epub 2014 Mar 23. PMID:
24658644; PMCID: PMC4122333.
(6) Korthauer, K.D., Chu, LF., Newton, M.A. et
al. A statistical approach for identifying differential distributions in
single-cell RNA-seq experiments. Genome Biol 17, 222 (2016). https://doi.org/10.1186/s13059-016-1077-y