1 Introduction


In the original DEGage publication (1), DEGage was compared against multiple popular differential expression analysis tools and tests. This vignette details the general workflow used for these comparisions with a dummy example on simulated data.

First, all of the appropriate libraries must be loaded in:

library(DEGage)
library(DESeq2)
library(edgeR)
library(DEsingle)
library(scDD)
library(monocle3)
library(MASS)
library(dplyr)
library(SingleCellExperiment)
library(ggplot2)
library(ggpubr)

2 Count Simulation


For the purposes of this vignette, the packages will be compared on a small data set generated with DEGage_Simulation(). This function simulates counts along a negative binomial simulation and simulates differential expression by introducing a log2 fold change difference between the means of two sets of counts. The means, dispersions, and dropout proportions can be generated automatically by the simulation framework; however, in this case, we will control the dispersions, dropout proportions, and log2 fold change differences.

ngenes = 800
ndegs = 200
ngroup1 = 20
ngroup2 = 20
lfcs = runif(ndegs, min = 2.5, max = 7.5)
dispersions = runif(ndegs + ngenes, min = 0.1, max = 10)
means = abs(rnorm(ndegs + ngenes, mean = 50, sd = 20))

counts <- DEGage_Simulation(ngenes = ngenes,
                            ndegs = ndegs,
                            ngroup1 = ngroup1,
                            ngroup2 = ngroup2,
                            lfc = lfcs,
                            dispersions = dispersions,
                            means = means ,
                            max.prop.zeros = 0.4,
                            min.prop.zeros = 0.2)
head(counts)
#>      Cell1.Group1 Cell2.Group1 Cell3.Group1 Cell4.Group1 Cell5.Group1
#> DEG1           73            0           22           15            8
#> DEG2            0           86           22            0          145
#> DEG3           25          112           26           63           75
#> DEG4            0           27            7            0           18
#> DEG5            0           22            0          193           17
#> DEG6           16            9           21            4           17
#>      Cell6.Group1 Cell7.Group1 Cell8.Group1 Cell9.Group1 Cell10.Group1
#> DEG1            0          121           79            0             0
#> DEG2          125          111           38          142             0
#> DEG3           59           79           34            0            30
#> DEG4            5           24            0            0            34
#> DEG5           17           65            0            1             0
#> DEG6            0            0           10            0            29
#>      Cell11.Group1 Cell12.Group1 Cell13.Group1 Cell14.Group1 Cell15.Group1
#> DEG1             5             0            41            19             3
#> DEG2             0             0            24             0           168
#> DEG3            31            32             0             0            89
#> DEG4             8            11            31            22            29
#> DEG5            16            45             0             0            25
#> DEG6            16             0             0            13            26
#>      Cell16.Group1 Cell17.Group1 Cell18.Group1 Cell19.Group1 Cell20.Group1
#> DEG1            52            27            61             0            46
#> DEG2             7            15           125             0            12
#> DEG3             0            51             0            55             0
#> DEG4            15            33            17            17            23
#> DEG5             0             0             6            33             0
#> DEG6             7             0            21            24             8
#>      Cell21.Group2 Cell22.Group2 Cell23.Group2 Cell24.Group2 Cell25.Group2
#> DEG1          6765             0           339             0           887
#> DEG2             0             0           534           609           103
#> DEG3           944             0           616          2097             0
#> DEG4          1157          1431             0             0          1942
#> DEG5          7664           137             0             6             4
#> DEG6           351             0             0           662           414
#>      Cell26.Group2 Cell27.Group2 Cell28.Group2 Cell29.Group2 Cell30.Group2
#> DEG1             0             0         12383          3587          1836
#> DEG2           234           252          1675           102           974
#> DEG3          2070          3185          1210             0          2187
#> DEG4          1565          1279          1266          1624             0
#> DEG5           478             0          4611             0             0
#> DEG6           506           715             0           787           778
#>      Cell31.Group2 Cell32.Group2 Cell33.Group2 Cell34.Group2 Cell35.Group2
#> DEG1             0          3631          1988          2865          5004
#> DEG2           265             0           305             0          1294
#> DEG3          1463           536             0          1318          2597
#> DEG4             0          1278          2004             0             0
#> DEG5            90             0           121             1            24
#> DEG6           653           466           937             0             0
#>      Cell36.Group2 Cell37.Group2 Cell38.Group2 Cell39.Group2 Cell40.Group2
#> DEG1             0          5279             0         12512         10388
#> DEG2             0           382             0             0           493
#> DEG3          1988          1302          3000          1584          6359
#> DEG4          5490             0          1337          2391          1642
#> DEG5             0          4047          2068             0             7
#> DEG6           578           949           714             0             0

3 Package Usage


In this section, ## DEGage DEGage detects pairwise differential expression by using the difference of two negative binomial (DOTNB) distribution. It uses a generalized linear model (glm) to estimate the r and p parameters of the count distribution for two conditions independently. Then, it tests for differential expression using the closed form of the DOTNB cumulative density function. DEGage only has two required inputs. The first is a data.frame containing counts, where columns represent cells and rows represent genes. The second is is a factor containing a vector of counts that indicates which cell belongs to which group. The entire DEGage pipeline is encapsulated within a single command, DEGage().

groups <- factor(c(rep(1, ngroup1), rep(2, ngroup2)))
start.time <- Sys.time()
degage.results <- DEGage(counts, groups, perm.pval = 0.05)
degage.runtime <- difftime(Sys.time(), start.time, units = "secs")

DEGage outputs the estimated parameters for both gene wise distributions, pvalues,and FDRs for each gene:

head(degage.results)
#>                r1         p1   mu1           r2           p2    mu2 base.mean
#> DEG1 3.371625e-01 0.01165154 28.60 1.257453e+07 0.9997318154 3373.2  1700.900
#> DEG2 4.315575e+05 0.99988184 51.00 2.095497e+06 0.9998277078  361.1   206.050
#> DEG3 5.820398e+05 0.99993463 38.05 7.479521e+06 0.9997830813 1622.8   830.425
#> DEG4 2.246844e+05 0.99992857 16.05 6.385293e+06 0.9998089255 1220.3   618.175
#> DEG5 1.846777e-01 0.00832456 22.00 1.137744e-01 0.0001181441  962.9   492.450
#> DEG6 1.709727e+05 0.99993537 11.05 2.712526e+06 0.9998431597  425.5   218.275
#>           z.r1       z.p1       z.r2         z.p2       k permPvals pval FDR
#> DEG1 1.2702313 0.04252499  1.4123380 0.0004185186 3344.60    0.0000    0   0
#> DEG2 1.2422584 0.02377880  1.6599843 0.0045759851  310.10    0.0000    0   0
#> DEG3 4.9968721 0.11607979  2.9068931 0.0017880795 1584.75    0.0000    0   0
#> DEG4 5.1942560 0.24450167  5.1525280 0.0042045921 1204.25    0.0000    0   0
#> DEG5 0.7245848 0.03188550  0.1138352 0.0001182073  940.90    0.0175    0   0
#> DEG6 5.5352746 0.33374633 12.6289489 0.0288247305  414.45    0.0000    0   0

3.1 DESeq2

DESeq2 (2) fits negative binomial parameters for each gene using a glm, then, it uses empirical Bayes shrinkage to moderate gene wise dispersions towards a curve fit between the means and dispersions of all genes in a data set. Then, it uses a Wald test to determine significant differential expression. For a more detailed account of DESeq2 usage, please see this Bioconductor tutorial

dim.matrix <- matrix(c(rep("Group1", ngroup1), rep("Group2", ngroup2)),
                     nrow = ngroup1+ngroup2,
                     dimnames = list(colnames(counts), 'Group'))
start.time <- Sys.time()
DESeqobj <- DESeqDataSetFromMatrix(countData = counts+1,
                                   colData = dim.matrix,
                                   design = ~Group)
DESeq2.output <-DESeq(DESeqobj)
DESeq2.runtime <- difftime(Sys.time(), start.time, units = "secs")
DESeq2.results <- as.data.frame(DESeq2::results(DESeq2.output))

DESeq2 outputs shrunken log2 fold changes, alogside Wald Test p values and adjusted p values:

head(DESeq2.results)
#>       baseMean log2FoldChange     lfcSE     stat       pvalue         padj
#> DEG1 708.44299       6.234256 2.6244037 2.375494 1.752546e-02 3.891683e-02
#> DEG2  90.16945       2.233713 1.2044864 1.854494 6.366853e-02 1.124438e-01
#> DEG3 348.22252       4.772169 1.2049549 3.960454 7.480727e-05 4.107599e-04
#> DEG4 261.73097       5.568398 1.2311384 4.522966 6.097898e-06 5.991403e-05
#> DEG5  37.25168       2.914754 0.8342316 3.493939 4.759496e-04 1.819453e-03
#> DEG6  93.06515       4.553237 1.1951291 3.809829 1.390629e-04 6.666191e-04

3.2 edgeR

edgeR (3) uses maxiumum likelihood to fit negative binomial parameters, then uses a similar empirical Bayes procedure to DESeq2 to shrink dispersions towards a consensus estimate. It then uses a test analagous to a Fishers Exact test to determine differential expression. For extended edgeR documentation, see the edgeR user guide

dim.matrix <- matrix(c(rep("Group1", ngroup1), rep("Group2", ngroup2)),
                     nrow = ngroup1+ngroup2,
                     dimnames = list(colnames(counts), 'Group'))
dim.matrix <- factor(dim.matrix)

list <-DGEList(counts)

design <- model.matrix(~0+dim.matrix)
colnames(design) <- levels (dim.matrix)

start.time <- Sys.time()
AveLogCPM <-aveLogCPM(list)
list <- calcNormFactors(list)
list <- estimateDisp(list, design, Robust = TRUE)
fit <-glmQLFit(list, design, robust = TRUE)
onev.two <-makeContrasts(Group1-Group2, levels = design)
res <- glmQLFTest(fit, contrast = onev.two)
edgeR.runtime <- difftime(Sys.time(), start.time, units = "secs")

edgeR.results <- as.data.frame(topTags(res, n =nrow(counts) ,adjust.method = "fdr",p.value = 0.05))

Similarly to DESeq2, edgeR provides shrunken log2 fold change estimates, an F statistic, a p value, and an adjusted p value for each gene:

head(edgeR.results)
#>            logFC    logCPM        F       PValue          FDR
#> DEG60  -8.093202 15.287828 34.08219 5.326128e-09 5.326128e-06
#> DEG93  -7.392986 14.412392 30.90241 2.731528e-08 8.810957e-06
#> DEG103 -5.444523  7.949437 30.78969 2.894754e-08 8.810957e-06
#> DEG79  -7.296564 14.782233 30.07230 4.188814e-08 8.810957e-06
#> DEG102 -7.217054 14.327245 29.97442 4.405479e-08 8.810957e-06
#> DEG10  -7.066591 14.465988 29.02354 7.192821e-08 9.374508e-06

3.3 DESingle

DESingle (4) uses maximum likelihood estimation to fit gene wise parameters along a zero inflated negative binomial (ZINB) distribution. Then, it uses a likelihood ratio test to determine differential expression. By using information provided by the third dispersion parameter for the ZINB distrbution, DESingle classifies DE genes into three categories: differential expression (DE), differential expression abundance (DEa), and general differential expression (DEg). For more information on DESingle, see its documentation.

groups <- factor(c(rep(1, ngroup1), rep(2, ngroup2)))
start.time <- Sys.time()
DEsingle.output <- DEsingle(counts, groups)
DESingle.runtime <- difftime(Sys.time(), start.time, units = "secs")
DEsingle.results <- DEtype(DEsingle.output, threshold = 0.05)

DEsingle returns gene parameters, p values, adjusted pvalues, DE classifications, and information as to whether a gene was up or down regulated:

head(DEsingle.results)
#>           theta_1   theta_2      mu_1     mu_2    size_1    size_2     prob_1
#> DEG17  0.05000003 0.3500000  41.52632 5369.922 12.460236 12.196720 0.23080259
#> DEG198 0.20000033 0.2000001 100.43752 8660.375  6.896802 14.102844 0.06425533
#> DEG131 0.34999951 0.2499999  64.15386 6287.000 10.589687 25.960261 0.14168027
#> DEG122 0.19999987 0.1999996  60.99997 5263.501 12.274866  7.484484 0.16751816
#> DEG79  0.40000000 0.2000000  64.50000 7213.812 44.870512  7.039638 0.41026152
#> DEG107 0.30000002 0.0999999  47.71428 2176.444  8.457842 12.442159 0.15057010
#>              prob_2 total_mean_1 total_mean_2  foldChange norm_total_mean_1
#> DEG17  0.0022661559        35.25      4230.95 0.008331462             39.45
#> DEG198 0.0016257858        72.30      8339.70 0.008669377             80.35
#> DEG131 0.0041122168        37.70      5691.45 0.006623971             41.70
#> DEG122 0.0014199403        43.75      5085.70 0.008602552             48.80
#> DEG79  0.0009749041        34.75      7034.80 0.004939728             38.70
#> DEG107 0.0056842423        29.95      2374.95 0.012610792             33.40
#>        norm_total_mean_2 norm_foldChange  chi2LR1   pvalue_LR2 pvalue_LR3
#> DEG17            3490.45     0.011302268 135.8347 8.881784e-16          0
#> DEG198           6928.30     0.011597362 119.2508 5.413447e-13          0
#> DEG131           4715.25     0.008843646 118.1815 2.675637e-14          0
#> DEG122           4210.80     0.011589247 115.9770 1.891598e-12          0
#> DEG79            5771.05     0.006705885 112.2888 3.302913e-13          0
#> DEG107           1958.80     0.017051256 110.5063 4.674039e-13          0
#>             FDR_LR2 FDR_LR3 pvalue pvalue.adj.FDR Remark Type State
#> DEG17  5.861978e-14       0      0              0     NA  DEg  down
#> DEG198 1.128276e-11       0      0              0     NA  DEg  down
#> DEG131 1.059552e-12       0      0              0     NA  DEg  down
#> DEG122 2.496909e-11       0      0              0     NA  DEg  down
#> DEG79  8.174711e-12       0      0              0     NA  DEg  down
#> DEG107 1.028289e-11       0      0              0     NA  DEg  down

3.4 monocle3

This monocle3 (5) workflow fits a generalized linear model for each gene, and then it uses a Wald Test to determine differential expression. monocle3 provides multiple distribution families to fit the counts to; however, a negative binomial glm was chosen in this case, given that DEGage_Simulation() simulates counts along an NB distribution. See the monocle3 differential expression analysis tutorial for more information.

cell_metadata = data.frame(Group = c(rep(1,ngroup1), rep(2,ngroup2)),
                           row.names = colnames(counts))
gene_metadata = data.frame(gene_short_name = rownames(counts),
                           row.names = rownames(counts))

start.time <- Sys.time()
cds <- new_cell_data_set(data.matrix(counts),
                         cell_metadata = cell_metadata,
                         gene_metadata = gene_metadata)
gene_fits <- fit_models(cds,
                        model_formula_str = "~Group",
                        expression_family = "negbinomial")
fit_coefs <- coefficient_table(gene_fits)
monocle3.runtime <- difftime(Sys.time(), start.time, units = "secs")

fit_coefs
#> # A tibble: 2,998 × 14
#>    gene_short_name num_cells_expressed gene_id model        model_summary status
#>    <chr>                         <int> <chr>   <named list> <named list>  <chr> 
#>  1 DEG1                             27 DEG1    <negbin>     <smmry.ng>    OK    
#>  2 DEG1                             27 DEG1    <negbin>     <smmry.ng>    OK    
#>  3 DEG1                             27 DEG1    <negbin>     <smmry.ng>    OK    
#>  4 DEG2                             26 DEG2    <negbin>     <smmry.ng>    OK    
#>  5 DEG2                             26 DEG2    <negbin>     <smmry.ng>    OK    
#>  6 DEG2                             26 DEG2    <negbin>     <smmry.ng>    OK    
#>  7 DEG3                             30 DEG3    <negbin>     <smmry.ng>    OK    
#>  8 DEG3                             30 DEG3    <negbin>     <smmry.ng>    OK    
#>  9 DEG3                             30 DEG3    <negbin>     <smmry.ng>    OK    
#> 10 DEG4                             29 DEG4    <negbin>     <smmry.ng>    OK    
#> # ℹ 2,988 more rows
#> # ℹ 8 more variables: term <chr>, estimate <dbl>, std_err <dbl>,
#> #   test_val <dbl>, p_value <dbl>, normalized_effect <dbl>,
#> #   model_component <chr>, q_value <dbl>

fit_coefs contains parameters and p values for each gene. For simplicity, we will select the gene names and the p values from fit_coefs:

intermediate <- fit_coefs %>% filter(term == "Group")

monocle.results <-intermediate %>%
  filter (p_value < 0.05) %>%
  select(gene_short_name,p_value)
monocle.results <- monocle.results %>%
  distinct(monocle.results$gene_short_name, .keep_all = TRUE)

monocle.results <- as.data.frame(monocle.results)
rownames(monocle.results) <- monocle.results$gene_short_name
monocle.results <- monocle.results[,c(1,2)]
monocle.results$p.adj <- p.adjust(monocle.results$p_value, method = "fdr")

head(monocle.results)
#>      gene_short_name       p_value         p.adj
#> DEG1            DEG1 1.450275e-274 1.823203e-273
#> DEG2            DEG2  0.000000e+00  0.000000e+00
#> DEG3            DEG3 4.715311e-223 5.007986e-222
#> DEG4            DEG4  3.000227e-39  1.400106e-38
#> DEG6            DEG6  1.502381e-57  8.263096e-57
#> DEG7            DEG7 5.392723e-136 4.678757e-135

3.5 scDD

scDD (6) models genes using a Dirichlet process mixture. It classifies genes into four categories, DE (differential expression), DP (differential proportions), DM (differential modalities), and DB (combined DP + DM). A vignette detailing scDD usage is available here.

sce <- SingleCellExperiment(assays = list(counts = counts))
colData(sce)$condition = c(rep("Group1", ngroup1), rep("Group2", ngroup2))
start.time <- Sys.time()
sce <- preprocess(sce, scran_norm = TRUE) #counts must be normalized
scDD.output <- scDD(sce)
scDD.runtime <- difftime(Sys.time(), start.time, units = "secs")
scDD.results <- as.data.frame(scDD::results(scDD.output))

scDD returns a categorization and p values for each gene:

head(scDD.results)
#>      gene DDcategory Clusters.combined Clusters.c1 Clusters.c2 nonzero.pvalue
#> DEG1 DEG1         DE                 2           1           1   2.793168e-06
#> DEG2 DEG2         NS                 1           2           2   1.814473e-04
#> DEG3 DEG3         DB                 2           2           1   6.539819e-07
#> DEG4 DEG4         DE                 2           1           1   1.178014e-06
#> DEG5 DEG5         NS                 2           1           1   7.540524e-02
#> DEG6 DEG6         DB                 2           1           2   2.793168e-06
#>      nonzero.pvalue.adj zero.pvalue zero.pvalue.adj combined.pvalue
#> DEG1       2.513851e-05   0.7529235       0.9059148    2.959426e-05
#> DEG2       1.047780e-03   0.9269833       0.9997521    1.629906e-03
#> DEG3       1.569115e-05   0.5009034       0.8971508    5.218878e-06
#> DEG4       1.569115e-05   0.3184375       0.8777657    5.925461e-06
#> DEG5       1.878550e-01   0.5487415       0.8978808    1.731671e-01
#> DEG6       2.513851e-05   0.7546050       0.9059148    2.965565e-05
#>      combined.pvalue.adj
#> DEG1        2.314749e-04
#> DEG2        8.955529e-03
#> DEG3        9.759125e-05
#> DEG4        9.759125e-05
#> DEG5        3.675615e-01
#> DEG6        2.314749e-04

3.6 Wilcoxon

Although it’s not a specific scRNA-seq analysis framework, Wilcoxon tests are frequently used to test for differential expression, and are the default test used in large frameworks like Scanpy. The Wilcoxon rank sum test and non parametric and operates under the null hypothesis that two sets of counts follow a continuous distribution.

groups <- c(rep(1, ngroup1), rep(2, ngroup2))

calc.wilcox <- function(genewise.counts, groups){
  g1 <- as.numeric(genewise.counts[groups == 1])
  g2 <- as.numeric(genewise.counts[groups == 2])
  t <-wilcox.test(g1, g2)
  return(t$p.value)
}

start.time <- Sys.time()
wilcox.output <- apply(X = counts, FUN = calc.wilcox, MARGIN = 1,groups = groups)
wilcox.runtime <- difftime(Sys.time(), start.time, units = "secs")
wilcox.results <- data.frame(
                    pval = as.numeric(wilcox.output),
                    p.adj = p.adjust(as.numeric(wilcox.output), method = "fdr"),
                    row.names = names(wilcox.output)
                  )
head(wilcox.results)
#>              pval      p.adj
#> DEG1 0.0267127692 0.17808513
#> DEG2 0.0465195484 0.25403024
#> DEG3 0.0003371563 0.01053614
#> DEG4 0.0445247452 0.24874159
#> DEG5 0.2185964834 0.59563075
#> DEG6 0.0266980222 0.17808513

3.7 glm.nb

In addition to using a wilcoxon test, Petrany et. al also tested the performance of the glm.nb function from the MASS package. Some of the most popular DE analysis tools use a glm to estimate NB parameters, and then perform some form of post-processing on the parameters, such as the Bayes shrinkage strategy used by DESeq2. The following code runs a NB glm, and then performs a likelihood ratio test to determine differential expression. This tests the capabilites of glm’s in DE testing alone, without the bells and whistles of other DE analysis tools.

groups <- c(rep(1, ngroup1), rep(2, ngroup2))

calc.glm <- function(genewise.counts, groups){
  design <- data.frame(counts = genewise.counts,
                         group = groups)
    res <- tryCatch({res<-glm.nb(counts~group, design)},
                    error = function(e){return(NA)})
    if(length(res) == 1){
      p <- NA
    }else{
      lrt.res <- anova(res)
      p <- lrt.res$`Pr(>Chi)`[2]
    }
    return(p)
}

start.time <- Sys.time()
glm.output <- apply(X = counts, 
                    FUN = calc.glm, 
                    MARGIN = 1, 
                    groups = groups)
glm.runtime <- difftime(Sys.time(), start.time, units = "secs")
glm.results <- data.frame(
                pval = as.numeric(glm.output),
                p.adj = p.adjust(as.numeric(glm.output), method = "fdr"),
                row.names = names(glm.output)
              )
head(glm.results)
#>              pval        p.adj
#> DEG1 0.0000000000 0.0000000000
#> DEG2 0.0000000000 0.0000000000
#> DEG3 0.0000000000 0.0000000000
#> DEG4 0.0000000000 0.0000000000
#> DEG5 0.0002676476 0.0005217301
#> DEG6 0.0000000000 0.0000000000

4 Method Comparison


Next, we will perform a brief comparison of all the methods detailed above. First, we will ensure all of the dataframes containing the results of each package have a column called p.adj, if they don’t already.

colnames(degage.results)[15] <- "p.adj"
colnames(DESeq2.results)[6] <- "p.adj"
colnames(edgeR.results)[5] <- "p.adj"
colnames(DEsingle.results)[21] <- "p.adj"
colnames(scDD.results)[11] <- "p.adj"

Next, we will obtain basic performace statistics for each framework:

get_performance <- function(results, counts, packagename){
  
  results<- results[!is.na(results$p.adj),]
  expDEs<- rownames(results[results$p.adj <= 0.05,])
  
  trueDEs <- rownames(counts[1:ndegs,])
  
  trueEEs <- rownames(counts[ndegs + 1:nrow(counts),])
  
  tp <- sum(trueDEs %in% expDEs)
  fp <- sum(trueEEs %in% expDEs)
  tn <- sum(!(trueEEs %in% expDEs))
  fn <- sum(!(trueDEs %in% expDEs))
  
  sensitivity <- tp/(tp+fn)
  specificity <- tn/(tn+fp)
  f1 <- (2*tp)/(2*tp+fp+fn)
  
  performance <- data.frame("Package" = packagename, 
                            "ndegs"= length(expDEs), 
                            "Sensitivity" = sensitivity, 
                            "Specificity" = specificity, 
                            "F1" = f1)
  return(performance)
}


performance <- rbind(get_performance(degage.results, counts, "DEGage"), 
                      get_performance(DESeq2.results, counts, "DESeq2"),
                      get_performance(edgeR.results, counts, "edgeR"), 
                      get_performance(DEsingle.results, counts, "DEsingle"), 
                      get_performance(monocle.results, counts, "monocle3"), 
                      get_performance(scDD.results, counts, "scDD"), 
                      get_performance(wilcox.results, counts, "Wilcoxon"), 
                      get_performance(glm.results, counts, "glm.nb"))

performance$Runtime <- as.numeric(c(degage.runtime, 
                         DESeq2.runtime, 
                         edgeR.runtime, 
                         DESingle.runtime, 
                         monocle3.runtime, 
                         scDD.runtime, 
                         wilcox.runtime, 
                         glm.runtime))
performance
#>    Package ndegs Sensitivity Specificity        F1    Runtime
#> 1   DEGage   219       0.975       0.976 0.9307876  4.4603353
#> 2   DESeq2   143       0.705       0.998 0.8221574  4.5341818
#> 3    edgeR   179       0.875       0.996 0.9234828  0.2377856
#> 4 DEsingle   310       0.970       0.884 0.7607843 81.9927435
#> 5 monocle3   616       0.850       0.554 0.4166667  4.0095725
#> 6     scDD   225       0.920       0.959 0.8658824 83.0661187
#> 7 Wilcoxon    92       0.445       0.997 0.6095890  0.2798769
#> 8   glm.nb   640       0.975       0.555 0.4642857  4.6806390


As is seen above, DEGage provides one of the best trade offs between performance and run time. Note that this is not meant to be a representative comparative analysis of these methods, and is purely for illustrative purposes. For a more robust comparison, see (1)

5 References


(1) Petrany A., Chen R., Zhang S. and Chen, Y. “Theoretical Framework for the Difference of Two Negative Binomial Distributions and its Application in Comparative Analysis of Sequencing Data”. Genome Research, Under Review.
(2) Love, M.I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15, 550 (2014). https://doi.org/10.1186/s13059-014-0550-8
(3) Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010 Jan 1;26(1):139-40. doi: 10.1093/bioinformatics/btp616. Epub 2009 Nov 11. PMID: 19910308; PMCID: PMC2796818.
(4) Zhun Miao, Ke Deng, Xiaowo Wang, Xuegong Zhang, DEsingle for detecting three types of differential expression in single-cell RNA-seq data, Bioinformatics, Volume 34, Issue 18, September 2018, Pages 3223–3224, https://doi.org/10.1093/bioinformatics/bty332
(5) Trapnell C, Cacchiarelli D, Grimsby J, Pokharel P, Li S, Morse M, Lennon NJ, Livak KJ, Mikkelsen TS, Rinn JL. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol. 2014 Apr;32(4):381-386. doi: 10.1038/nbt.2859. Epub 2014 Mar 23. PMID: 24658644; PMCID: PMC4122333.
(6) Korthauer, K.D., Chu, LF., Newton, M.A. et al. A statistical approach for identifying differential distributions in single-cell RNA-seq experiments. Genome Biol 17, 222 (2016). https://doi.org/10.1186/s13059-016-1077-y