Analyzing scRNA-seq Data

Import two samples

Quality Control

Plot Data

## null device 
##           1

Joint filtering effects:

Selecting Data Subset

*** I follow the original script to filter based on these criteria: nFeature_RNA < 7500& nFeature_RNA>1500 & nCount_RNA > 500 and then, I added the filtering criteria of “percent.mt < 20”. After subseting, keeping those genes expressed in more than 100 cells

Reaccess quality metrics

## [1] 15897 16578

Apply sctransform normalization

This steps can replace NormalizeData(), ScaleData(), and FindVariableFeatures(). The results of sctransform are stored in the “SCT” assay. It is assumed to reveals sharper biological distinctions compared to the standard Seurat workflow.

Since it is a normalization step, we have to do separately for two different samples (that’s why split seurat objects here), then only integrate the sample expression data in the next step, integration analysis, to remove batch effect.

Integration

Clustering

## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 16578
## Number of edges: 588709
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9858
## Number of communities: 11
## Elapsed time: 2 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 16578
## Number of edges: 588709
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9352
## Number of communities: 23
## Elapsed time: 2 seconds

Clustering quality control

This step gives us some idea about how is the distribution of the number of genes, number of UMIs, and percentage of mitochondrial genes in each cluster. Normally, we expect to see similar distribution of no. of genes (nFeature_RNA) and no. of UMIs (nCount_RNA).

As for the percent.mt (percentage of mitochondrial genes per cell), it can be a reference to check if those high intensity clusters might be having poor quality cells (if so, we can try to remove in the next step or adjust the metrics in the previous filtering step) or it might be due to the differences biologically

## png 
##   2

Find all markers in two samples for cell type identification

Identifying cell type

Option 1: SingleR package with built-in reference

I use a collection of mouse bulk RNA-seq data sets obtained from celldex package (Benayoun et al. 2019). A variety of cell types are available, mostly from blood but also covering several other tissues. This identifies marker genes from the reference and uses them to compute assignment scores (based on the Spearman correlation across markers) for each cell in the test dataset against each label in the reference. The label with the highest score is the assigned to the test cell, possibly with further fine-tuning to resolve closely related labels.

This reference consists of a collection of mouse bulk RNA-seq data sets downloaded from the gene expression omnibus (Benayoun et al. 2019). A variety of cell types are available, again mostly from blood but also covering several other tissues.

## 
##    0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15 
## 3145 2396 2033 1646 1034  907  843  829  602  584  516  439  367  310  244  158 
##   16   17   18   19   20   21   22 
##  127  126   64   64   51   47   46

## png 
##   3

## png 
##   3

## png 
##   3

Option 2: manual annotation

refer to the seurat_integrated original script

Plotting Astrocyte Markers

## png 
##   2

Plotting Microglia Markers

## png 
##   2

Plotting Endothelial Markers

Plotting Oligodendrocyte Markers

Plotting Glutamatergic Neuron Markers

Plotting Gabaergic Neuron Markers

Plotting Oligodendrocyte Precursor Markers

D2 neuron marker

Sirt1 gene in D1 cluster Sirt1 gene in D2 cluster

Sirt2 gene in D1 cluster Chat gene - beautiful on cluster 20!!

Continue with this analysis here next time. Overall very good data, need to further filter mito cutoff, using 15%, nFeature with 1200 genes readRDS(“Seurat_integrated_twobrainsample.RDS”)

Rename cluster based on the SingleR results and manual annotation

## [1] TRUE

## [1] TRUE

Option 3: manual annotation

Refer to Tasic et al, Nature 2018, marker list # https://github.com/AllenInstitute/tasic2018analysis/blob/master/RNA-seq%20Analysis/markers.R

CDS scRNA-seq Workshop 2022 Section 2 - Integration Pipeline

Stephanie The

2022-04-06