We perform the DE analysis separately for each label to identify cell type-specific transcriptional effects of KO condition. The actual DE testing is performed on “pseudo-bulk” expression profiles (Tung et al. 2017), generated by summing counts together for all cells with the same combination of label and sample. This leverages the resolution offered by single-cell technologies to define the labels, and combines it with the statistical rigor of existing methods for DE analyses involving a small number of samples.
We first filter all the combinations sample-label that resulted in less than 10 cells. These are saved in /outs/old/DE_k20/fail_min_cells_DE.csv
We then compute the DE with edgeR, one of the best methods according to (Soneson et al. 2018). The results are saved in /outs/old/DE_k20/de_results
Output:
LogFC is the log fold-change, which is the log difference between both groups
LogCPM are the log counts per million, which can be understood as measuring expression level.
F is the F-statistic from the quasi-likelihood F-test.
PValue is the nominal p-value derived from F without any multiple testing correction
FDR (False discovery rate) is the PValue after Benjamini-Hochberg correction. For example, the set of genes with adjusted p value less than 0.1 should contain no more than 10% false positives.
If the comparison was not possible for a particular cluster, most commonly due to lack of residual degrees of freedom ( e.g. an absence of enough samples from both conditions) it is listed in /outs/old/DE_k20/fail_min_cells_DE.csv
For each one of the labels a list is produced with the DE genes at a FDR of 5%. Summaries of these results are saved in /outs/old/DE_k20/
Values of 1, -1 and 0 indicate that the gene is significantly upregulated, downregulated or not significant, respectively. Genes listed as NA were either filtered out as low-abundance genes for a given label’s analysis, or the comparison of interest was not possible for that particular label.
de_filtered_genecounts.csv contains a small summary with the number of genes DE for each one of the clusters.
de_filtered_genes.csv contains the list of genes DE.
Plot the genes DE in the most clusters
Saved in /outs/old/DE_k20/de_filtered_plots