Introduction

Hopefully, following the Seurat workflow on clustering 2.7K bcs has given you a “feel” for what scRNA-seq analysis entails. In this tutorial, we will be analyzing the a slightly larger dataset of Retinal Bipolar Cells (BCs) sequenced using the Drop-seq method from the publication Shekhar et al., Cell, 2016 (a copy of which has been made available to you). We will use the Seurat package, and follow roughly the same steps that were applied for pbmcs in the earlier tutorial. Retinal Bipolar Cells are a heterogenous class of interneurons in the retina involved in the processing of visual signals. In the paper, the authors identified 15 molecularly distinct types of bipolar neurons, and matched them against cell morphology. The paper identified all 12 types of bipolar neurons that had been described earlier, in addition to three novel types.

The original publication began with nearly 44000 cells, of which ~28,000 cells passed QC and featured in the final analysis. In this dataset, you will begin with 8000 cells. I have indicated some key steps below, but your task is to carve out your own analysis path and compare results against those reported in the paper. Even though the figures might look different because of the small size of the dataset and the different methods used here compared to those in the paper, ask yourself if you are able to recover cell types with similar molecular signatures. You are encouraged (although not obligated) to work in groups.

Read the count matrix and setup the Seurat object

Load necessary packages

library(Seurat)
library(dplyr)
library(Matrix)
library(MASS)
library(topGO)
library(xtable)
source("../sc_workshop/utilities.R")

Load the data matrix

# Load the bc dataset (loads a sparse matrix Count.mat)
load("../sc_workshop/bipolar8000.Rdata")

Initialize a Seurat Object

# Initialize the Seurat object with the raw (non-normalized data).  Keep all
# genes expressed in >= 10 cells. Keep all cells with at
# least 500 detected genes
bc <- CreateSeuratObject(raw.data = Count.mat, min.cells = 10, min.genes = 500, 
                           project = "Dropseq_bipolar")

As before, bc@raw.data is a slot that stores the original gene expression matrix. We can visualize the first 20 rows (genes) and the first 10 columns (cells),

bc@raw.data[1:20,1:10]
## 20 x 10 sparse Matrix of class "dgCMatrix"
##    [[ suppressing 10 column names 'Bipolar4_CGGGACGGCTAA', 'Bipolar2_TACCGCAGCTTA', 'Bipolar4_AGCGGATCTAGA' ... ]]
##                                  
## 0610007P14Rik . . . . . . . . . .
## 0610009B22Rik . . . . . . . . . .
## 0610009E02Rik . . . . . . . . . .
## 0610009L18Rik . . . . . . . . . .
## 0610009O20Rik . . . . . . . . . .
## 0610010F05Rik . 2 . . . . . . . .
## 0610010K14Rik . . . . . . . . . .
## 0610011F06Rik . . . 1 . 1 . . . .
## 0610030E20Rik . . . . . . . . . .
## 0610037L13Rik . . 1 . . . . . . .
## 0610040B10Rik . . . . 1 . . . . .
## 0610040J01Rik . . . . . . . . . .
## 0610043K17Rik . . . . . . . . . .
## 1110001J03Rik . . . . . . . . 2 .
## 1110002L01Rik . . . . . . . . 1 .
## 1110004E09Rik . . . . . . . . . .
## 1110004F10Rik 1 2 . 1 . 1 2 . . .
## 1110007C09Rik . . . . . . . . . .
## 1110008F13Rik . . . . . . 1 . . .
## 1110008L16Rik . . . . . . . . . .

Preprocessing step 1 : Filter out unhealthy cells

Following the same procedure as before, we compute percent.mito for each cell. Note that the search string input to the grep function below is slightly modified to accomodate mouse gene names.

# The number of genes and UMIs (nGene and nUMI) are automatically calculated for every object by Seurat.  For non-UMI
# data, nUMI represents the sum of the non-normalized values within a cell We calculate the percentage of mitochondrial
# genes here and store it in percent.mito using AddMetaData.  We use object@raw.data since this represents
# non-transformed and non-log-normalized counts The % of UMI mapping to mt-genes is a common scRNA-seq QC metric.
mito.genes <- grep(pattern = "^mt-", x = rownames(x = bc@data), value = TRUE)
percent.mito <- Matrix::colSums(bc@raw.data[mito.genes, ])/Matrix::colSums(bc@raw.data)

# AddMetaData adds columns to object@meta.data, and is a great place to stash QC stats
bc <- AddMetaData(object = bc, metadata = percent.mito, col.name = "percent.mito")
VlnPlot(object = bc, features.plot = c("nGene", "nUMI", "percent.mito"), nCol = 3)