Hopefully, following the Seurat workflow on clustering 2.7K bcs has given you a “feel” for what scRNA-seq analysis entails. In this tutorial, we will be analyzing a slightly larger dataset of Retinal Bipolar Cells (BCs) sequenced using the Drop-seq method from the publication Shekhar et al., Cell, 2016 (download a copy here : https://www.dropbox.com/s/cpw4xysbs170jc8/Bipolar2016_maintext.pdf?dl=0). We will use the Seurat package, and follow roughly the same steps that were applied for pbmcs in the earlier tutorial. Retinal Bipolar Cells are a heterogenous class of interneurons in the retina involved in the processing of visual signals. In the paper, the authors identified 15 molecularly distinct types of bipolar neurons, and matched them against cell morphology. The paper identified all 12 types of bipolar neurons that had been described earlier, in addition to three novel types.
The original publication began with nearly 44000 cells, of which ~28,000 cells passed QC and featured in the final analysis. In this dataset, you will begin with 8000 cells. I have indicated some key steps below, but your task is to carve out your own analysis path and compare results against those reported in the paper. Even though the figures might look different because of the small size of the dataset and the different methods used here compared to those in the paper, ask yourself if you are able to recover cell types with similar molecular signatures. You are encouraged (although not obligated) to work in groups.
Load necessary packages
library(Seurat)
library(dplyr)
library(Matrix)
library(MASS)
library(topGO)
library(xtable)
source("../sc_workshop/utilities.R")
Load the data matrix
# Load the bc dataset (loads a sparse matrix Count.mat)
load("../sc_workshop/bipolar8000.Rdata")
Initialize your Seurat Object using the count matrix Count.mat. Name this object bc (for Bipolar Cells)
norm.fac = median(Matrix::colSums(AnyCountMatrix)FindVariableGenes to obtain at least 1000-1500 genes for dimensionality reduction. Also consider an alternative model that appears to be more relevant for UMI-based data to estimate variable genes, implemented in the provided NB.var.genes defined in the script utilities.R. Sample invocation : var.genes.NB <- NB.var.genes(bc, do.idents = FALSE, set.var.genes = FALSE, num.sd=1, x.high.cutoff = 15, x.low.cutoff = 0.005)How different is the set of variable genes output by NB.var.genes compared to FindVariableGenes? Use either, their intersection or union for further analysis (or an entirely different method if you feel courageous). Make sure you set bc@var.genes to the set of variable genes.
Perform PCA, and select an appropriate number of PCs. Use either PCAPlot, PCHeatmap or PCElbowPlot to make an informed choice. Make sure you use this number as input to the pcs.use parameter in FindClusters and RunTSNE
Find differentially expressed genes for the clusters and generate a heatmap.
FindClusters?DotPlot summarizing the clusters and the top markers?FeaturePlot to plot the expression of individual genes on the tSNE maporig.ident column of bc@meta.data (which is an R data frame). Similarly, the cluster id’s of each cell determined by FindClusters is stored in the column res.0.*. Examine for batch effects (or the lack of it using the following two methods),
bc@ident stores the cluster id’s, which is what TSNEPlot or VlnPlot uses to group the data. But you can change it to whatever you want using SetAllIdent. Can you visualize the TSNEPlot where each cell is now colored by your batch of origin.Does the “novel” unipolar cell type we found in our paper come out as a separate cluster in your analysis - see Fig. 3.
Perform the GO analysis. Do the terms that show up strike you as meaningful?