Differential Expression

set-up

Use of pseudo-bulk samples

We perform the DE analysis separately for each label to identify cell type-specific transcriptional effects of KO condition. The actual DE testing is performed on “pseudo-bulk” expression profiles (Tung et al. 2017), generated by summing counts together for all cells with the same combination of label and sample. This leverages the resolution offered by single-cell technologies to define the labels, and combines it with the statistical rigor of existing methods for DE analyses involving a small number of samples.

Perform DE

We first filter all the combinations sample-label that resulted in less than 10 cells. These are saved in /outs/old/DE_k20/fail_min_cells_DE.csv

We then compute the DE with edgeR, one of the best methods according to (Soneson et al. 2018). The results are saved in /outs/old/DE_k20/de_results

Output:

LogFC is the log fold-change, which is the log difference between both groups

LogCPM are the log counts per million, which can be understood as measuring expression level.

F is the F-statistic from the quasi-likelihood F-test.

PValue is the nominal p-value derived from F without any multiple testing correction

FDR (False discovery rate) is the PValue after Benjamini-Hochberg correction. For example, the set of genes with adjusted p value less than 0.1 should contain no more than 10% false positives.

If the comparison was not possible for a particular cluster, most commonly due to lack of residual degrees of freedom ( e.g. an absence of enough samples from both conditions) it is listed in /outs/old/DE_k20/fail_min_cells_DE.csv

Filter the results

For each one of the labels a list is produced with the DE genes at a FDR of 5%. Summaries of these results are saved in /outs/old/DE_k20/

Values of 1, -1 and 0 indicate that the gene is significantly upregulated, downregulated or not significant, respectively. Genes listed as NA were either filtered out as low-abundance genes for a given label’s analysis, or the comparison of interest was not possible for that particular label.

de_filtered_genecounts.csv contains a small summary with the number of genes DE for each one of the clusters.

de_filtered_genes.csv contains the list of genes DE.

Plot the genes DE in the most clusters

Plot the results

Saved in /outs/old/DE_k20/de_filtered_plots