Hopefully, following the Seurat workflow on clustering 2.7K bcs has given you a “feel” for what scRNA-seq analysis entails. In this tutorial, we will be analyzing a slightly larger dataset of Retinal Bipolar Cells (BCs) sequenced using the Drop-seq method from the publication Shekhar et al., Cell, 2016 (download a copy here : https://www.dropbox.com/s/cpw4xysbs170jc8/Bipolar2016_maintext.pdf?dl=0). We will use the Seurat package, and follow roughly the same steps that were applied for pbmcs in the earlier tutorial. Retinal Bipolar Cells are a heterogenous class of interneurons in the retina involved in the processing of visual signals. In the paper, the authors identified 15 molecularly distinct types of bipolar neurons, and matched them against cell morphology. The paper identified all 12 types of bipolar neurons that had been described earlier, in addition to three novel types.
The original publication began with nearly 44000 cells, of which ~28,000 cells passed QC and featured in the final analysis. In this dataset, you will begin with 8000 cells. I have indicated some key steps below, but your task is to carve out your own analysis path and compare results against those reported in the paper. Even though the figures might look different because of the small size of the dataset and the different methods used here compared to those in the paper, ask yourself if you are able to recover cell types with similar molecular signatures. You are encouraged (although not obligated) to work in groups.
Load necessary packages
library(Seurat)
library(dplyr)
library(Matrix)
library(MASS)
library(topGO)
library(xtable)
source("../sc_workshop/utilities.R")
Load the data matrix
# Load the bc dataset (loads a sparse matrix Count.mat)
load("../sc_workshop/bipolar8000.Rdata")
Initialize your Seurat Object using the count matrix Count.mat
. Name this object bc
(for Bipolar Cells)
norm.fac = median(Matrix::colSums(AnyCountMatrix)
FindVariableGenes
to obtain at least 1000-1500 genes for dimensionality reduction. Also consider an alternative model that appears to be more relevant for UMI-based data to estimate variable genes, implemented in the provided NB.var.genes
defined in the script utilities.R
. Sample invocation : var.genes.NB <- NB.var.genes(bc, do.idents = FALSE, set.var.genes = FALSE, num.sd=1, x.high.cutoff = 15, x.low.cutoff = 0.005)
How different is the set of variable genes output by NB.var.genes
compared to FindVariableGenes
? Use either, their intersection or union for further analysis (or an entirely different method if you feel courageous). Make sure you set bc@var.genes
to the set of variable genes.
Perform PCA, and select an appropriate number of PCs. Use either PCAPlot
, PCHeatmap
or PCElbowPlot
to make an informed choice. Make sure you use this number as input to the pcs.use
parameter in FindClusters
and RunTSNE
Find differentially expressed genes for the clusters and generate a heatmap.
FindClusters
?DotPlot
summarizing the clusters and the top markers?FeaturePlot
to plot the expression of individual genes on the tSNE map