Single Cell with EdgeR and Glimma

R Markdown

#BiocManager::install("scRNAseq")
library(scRNAseq)

## Loading required package: SingleCellExperiment

## Loading required package: SummarizedExperiment

## Loading required package: MatrixGenerics

## Loading required package: matrixStats

## 
## Attaching package: 'MatrixGenerics'

## The following objects are masked from 'package:matrixStats':
## 
##     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
##     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
##     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
##     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
##     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
##     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
##     colWeightedMeans, colWeightedMedians, colWeightedSds,
##     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
##     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
##     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
##     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
##     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
##     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
##     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
##     rowWeightedSds, rowWeightedVars

## Loading required package: GenomicRanges

## Loading required package: stats4

## Loading required package: BiocGenerics

## 
## Attaching package: 'BiocGenerics'

## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs

## The following objects are masked from 'package:base':
## 
##     anyDuplicated, append, as.data.frame, basename, cbind, colnames,
##     dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
##     grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
##     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
##     rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
##     union, unique, unsplit, which.max, which.min

## Loading required package: S4Vectors

## 
## Attaching package: 'S4Vectors'

## The following objects are masked from 'package:base':
## 
##     expand.grid, I, unname

## Loading required package: IRanges

## 
## Attaching package: 'IRanges'

## The following object is masked from 'package:grDevices':
## 
##     windows

## Loading required package: GenomeInfoDb

## Loading required package: Biobase

## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.

## 
## Attaching package: 'Biobase'

## The following object is masked from 'package:MatrixGenerics':
## 
##     rowMedians

## The following objects are masked from 'package:matrixStats':
## 
##     anyMissing, rowMedians

#BiocManager::install("scater")
library(scater)

## Loading required package: scuttle

## Loading required package: ggplot2

#BiocManager::install("scran")
library(scran)
#BiocManager::install("Glimma")
library(Glimma)
library(edgeR)

## Loading required package: limma

## 
## Attaching package: 'limma'

## The following object is masked from 'package:scater':
## 
##     plotMDS

## The following object is masked from 'package:BiocGenerics':
## 
##     plotMA

## 
## Attaching package: 'edgeR'

## The following object is masked from 'package:SingleCellExperiment':
## 
##     cpm

sce <- ZeiselBrainData(ensembl=TRUE)

## snapshotDate(): 2022-04-26

## see ?scRNAseq and browseVignettes('scRNAseq') for documentation

## loading from cache

## see ?scRNAseq and browseVignettes('scRNAseq') for documentation

## loading from cache

## see ?scRNAseq and browseVignettes('scRNAseq') for documentation

## loading from cache

## snapshotDate(): 2022-04-26

## see ?scRNAseq and browseVignettes('scRNAseq') for documentation

## loading from cache

## snapshotDate(): 2022-04-25

## loading from cache

## require("ensembldb")

## Warning: Unable to map 1565 of 20006 requested IDs.

Once the data is loaded in we follow the OSCA procedure for identifying highly variable genes for creating a multi-dimensional scaling (MDS) plot. We use the functions provided by scran to identify the most highly variable genes rather than the algorithm within glimmaMDS, as scran is tailored towards single cells.

sce <- logNormCounts(sce)

var_mod <- modelGeneVar(sce)
hvg_genes <- getTopHVGs(var_mod, n=500)
hvg_sce <- sce[hvg_genes, ]

hvg_sce <- logNormCounts(hvg_sce)

Choosing to colour the MDS plot using level1class reveals separation between cell types.

glimmaMDS(
    exprs(hvg_sce),
    groups = colData(hvg_sce)
) # about 10min run before openning Rstudio viewer

To demonstrate the MA plot we will perform a differential expression analysis using the pseudo-bulk approach. This involves creating pseudo-bulk samples by aggregating single cells as an analogue of biological replicates. Here the pseudo-bulk samples will be generated from combinations of level1class and level2class, the cells belonging to unique combinations of the two factors will be aggregated into samples

colData(sce)$pb_group <-
    paste0(colData(sce)$level1class,
           "_",
           colData(sce)$level2class)

sce_counts <- counts(sce)
pb_counts <- t(rowsum(t(sce_counts), colData(sce)$pb_group))

pb_samples <- colnames(pb_counts)
pb_samples <- gsub("astrocytes_ependymal", "astrocytes-ependymal", pb_samples)
pb_split <- do.call(rbind, strsplit(pb_samples, "_"))
pb_sample_anno <- data.frame(
    sample = pb_samples,
    cell_type = pb_split[, 1],
    sample_group = pb_split[, 2]
)

With the pseudo-bulk annotations and counts we can construct a DGEList object.

pb_dge <- DGEList(
    counts = pb_counts,
    samples = pb_sample_anno,
    group = pb_sample_anno$cell_type
)

pb_dge <- calcNormFactors(pb_dge)

With this we perform differential expression analysis between “pyramidal SS” and “pyramidal CA1” samples using edgeR’s generalised linear models.

design <- model.matrix(~0 + cell_type, data = pb_dge$samples)
colnames(design) <- make.names(gsub("cell_type", "", colnames(design)))

pb_dge <- estimateDisp(pb_dge, design)

contr <- makeContrasts("pyramidal.SS - pyramidal.CA1", levels = design)

pb_fit <- glmFit(pb_dge, design)
pb_lrt <- glmLRT(pb_fit, contrast = contr)

The results of this analysis can be visualised using glimmaMA() as it would be for bulk RNA-seq.

glimmaMA(pb_lrt, dge = pb_dge) # another Rstudio viewer window, can publish

Single Cell with EdgeR and Glimma

Shenfeng Qiu

2022-08-12

R Markdown