SQ NOTE: cannot knit to html because of the glimaMDS and glimaMA plot. Can Run ALL to produce interactive figures. https://www.bioconductor.org/packages/release/bioc/vignettes/Glimma/inst/doc/single_cell_edger.html
Once the data is loaded in we follow the OSCA procedure for identifying highly variable genes for creating a multi-dimensional scaling (MDS) plot. We use the functions provided by scran to identify the most highly variable genes rather than the algorithm within glimmaMDS, as scran is tailored towards single cells.
sce <- ZeiselBrainData(ensembl=TRUE)
## Warning: Unable to map 1565 of 20006 requested IDs.
sce <- logNormCounts(sce)
var_mod <- modelGeneVar(sce)
hvg_genes <- getTopHVGs(var_mod, n=500)
hvg_sce <- sce[hvg_genes, ]
hvg_sce <- logNormCounts(hvg_sce)
Choosing to colour the MDS plot using level1class
reveals separation between cell types. To view the
interactive Glimma plot, click here.
To demonstrate the MA plot we will perform a differential expression analysis using the pseudo-bulk approach. This involves creating pseudo-bulk samples by aggregating single cells as an analogue of biological replicates. Here the pseudo-bulk samples will be generated from combinations of level1class and level2class, the cells belonging to unique combinations of the two factors will be aggregated into samples
colData(sce)$pb_group <-
paste0(colData(sce)$level1class,
"_",
colData(sce)$level2class)
sce_counts <- counts(sce)
pb_counts <- t(rowsum(t(sce_counts), colData(sce)$pb_group))
pb_samples <- colnames(pb_counts)
pb_samples <- gsub("astrocytes_ependymal", "astrocytes-ependymal", pb_samples)
pb_split <- do.call(rbind, strsplit(pb_samples, "_"))
pb_sample_anno <- data.frame(
sample = pb_samples,
cell_type = pb_split[, 1],
sample_group = pb_split[, 2]
)
With the pseudo-bulk annotations and counts we can construct a DGEList object.
pb_dge <- DGEList(
counts = pb_counts,
samples = pb_sample_anno,
group = pb_sample_anno$cell_type
)
pb_dge <- calcNormFactors(pb_dge)
With this we perform differential expression analysis between “pyramidal SS” and “pyramidal CA1” samples using edgeR’s generalised linear models.
design <- model.matrix(~0 + cell_type, data = pb_dge$samples)
colnames(design) <- make.names(gsub("cell_type", "", colnames(design)))
pb_dge <- estimateDisp(pb_dge, design)
contr <- makeContrasts("pyramidal.SS - pyramidal.CA1", levels = design)
pb_fit <- glmFit(pb_dge, design)
pb_lrt <- glmLRT(pb_fit, contrast = contr)
The results of this analysis can be visualised using glimmaMA() as it would be for bulk RNA-seq.