Edited by Shenfeng Qiu 07232023

CAN RUN ALL BLOCK BY BLOCK BEAUTIFULLY, INCLUDING MONOCLE3, AZIMUTH MAPPING AND SHINYCELL VISUALIZATION with reference to: https://satijalab.org/seurat/articles/integration_introduction.html

Import Group samples

Quality Control

Plot Data

Joint filtering effects:

After subseting, I only keep those genes expressed in more than 10 cells

Reaccess quality metrics

Apply sctransform normalization

Scale data

Clustering

Clustering quality control

This step gives us some idea about how is the distribution of the number of genes, number of UMIs, and percentage of mitochondrial genes in each cluster. Normally, we expect to see similar distribution of no. of genes (nFeature_RNA) and no. of UMIs (nCount_RNA).

As for the percent.mt (percentage of mitochondrial genes per cell), it can be a reference to check if those high intensity clusters might be having poor quality cells (if so, we can try to remove in the next step or adjust the metrics in the previous filtering step) or it might be due to the differences biologically

additional plotting

# do heatmaps

additiontal plots, adjusting format

# additional custermized heatmap plot # trial DEGs and GSEA analysis for a cluster. Note: the following block can run, but it takes long time and i am not sure it means anythig…so let’s skip this block.

Find all markers in two samples for cell type identification

# selectively plot certain idents

for all markers-higher resolution

repeat for lower resolution clusters

# repeat for lower resolution clustering

Identifying cell type

Option 1: SingleR package with built-in reference

I use a collection of mouse bulk RNA-seq data sets obtained from celldex package (Benayoun et al. 2019). A variety of cell types are available, mostly from blood but also covering several other tissues. This identifies marker genes from the reference and uses them to compute assignment scores (based on the Spearman correlation across markers) for each cell in the test dataset against each label in the reference. The label with the highest score is the assigned to the test cell, possibly with further fine-tuning to resolve closely related labels.

This reference consists of a collection of mouse bulk RNA-seq data sets downloaded from the gene expression omnibus (Benayoun et al. 2019). A variety of cell types are available, again mostly from blood but also covering several other tissues.

Now let’s map to the seurat azimuth human bone marrow online database

for finer resolution of clustering

Differential expression Analysis for groups

TO BE IMPLEMENTED ONLY WHEN NECESSARY