Two_DRG_sample_scRNAseq

Sample preparation information

This experiment is in collaboration with Dr. Frank Porreca’s lab, purpose is to see how chronic stress treatment alters sensitivity to pain, and how it affects transcriptomic expression at single cell level of dorsal root ganglia cells.
** Key Experimental Conditions:
1. Harrison did the chronic stress in 5 control (CTL) and 5 treated (TRT) male mice
2. Dissected most DRGs from 5 mice and pull together for enzymatic digestion, which uses their cell culture digestion protocol
3. Final cell density before loading to iX controller: CTL, 6x10⁵ cells per milliliter; TRL, 6x10⁵ cells per milliliter
4. Dispersed cells go through 10X Genomics Chromium pipeline, with Next GEM 3’ chemistry kit. cDNA library prepared with dual index TT kit. Sequence vendor Novogene, run cycle 151+8+8+151 paired reads

Here is mice information for the experiment:
Experiment Conditions Here is a typical image of dissociated cells, with large and small cells characteristic of DRG:
Dissociated DRG cells

Import libaries for analysis

The pipeline uses Seurat V4, also can be achieved with Scanpy

# setwd("D:/Bioinformatics Projects/DRG")
library(Seurat)
library(dplyr)
library(cowplot)
library(ggplot2)
library(rstatix)
library(SingleCellExperiment)
library(ComplexHeatmap)
library(RColorBrewer)
library(circlize)
library(patchwork)

Import two samples - control (CTL) and stress treated (TRT)

To minimize change of coding, I keep the existing variable names from previous scripts as much as possile. Copy the ./slim_counts/ files to the m10 (CTL) and m11 (TRT) folder under ‘./slim_counts’.

Quality Control

Quality assessment before filtering

Joint filtering effects

Quality assessment after filtering

Selecting Data Subset

Filter data based on these criteria: nFeature_RNA < 7500& nFeature_RNA>750 & nCount_RNA > 1000 and then, added the filtering criteria of “percent.mt < 15” After subseting, only keeping those genes expressed in more than 20 cells.

After subsetting and filtering, the dimension of pooled data (both CTL and TRT) is:19407, 41972

Re-assess quality metrics

Quality metrics after filtered counts

Apply `sctransform` normalization

This steps can replace NormalizeData(), ScaleData(), and FindVariableFeatures(). The results of sctransform are stored in the “SCT” assay. It is assumed to reveal sharper biological distinctions compared to the standard Seurat workflow.

Since it is a normalization step, we have to do separately for two different samples (that’s why split Seurat objects here), then only integrate the sample expression data in the next step, integration analysis, to remove batch effect.

Integration of two samples

Select the most variable features to use for integration… we will use 3000 features and IntegratData

Clustering

Run the standard workflow for linear and non-linear reduction, visualization and clustering

We use a resolution value of 0.1 for cluster identification, which seems appropriate for this sample size ~13K cells

Clustering quality control

This step gives us some idea about how is the distribution of the number of genes, number of UMIs, and percentage of mitochondrial genes in each cluster. Normally, we expect to see similar distribution of no. of genes (nFeature_RNA) and no. of UMIs (nCount_RNA).

As for the percent.mt (percentage of mitochondrial genes per cell), it can be a reference to check if those high intensity clusters might be having poor quality cells (if so, we can try to remove in the next step or adjust the metrics in the previous filtering step) or it might be due to the differences biologically.

Determine metrics to plot present in seurat_integrated@meta.data

NOTE from SQ: for certain feature if not found in ‘integrated’ assay (may only have 2000 highly variable features), you need to switch to “RNA” or “SCT” assay using DefaultAssay(seurat_integrated) <- “RNA”

Find all markers for each cluster in the two samples for cell type identification

Write all gene markers to a .csv file on local drive

Plotting top markers for each cluster

Setting up plotting parameters and plot the top marker for each cluster

Violin plot for selected gene markers

For example, a certain gene marker for the DRG cells > here i am plotting several: * using split violin gene for gene Bsg * regular split violin for desired clusters, e.g. c(0, 1, 5) * feature plot for the top one gene from each cluster, for the first 9 clusters * a blend feature plot of two selected top genes * regular Seurat heatmap of top 5 marker genes for each cluster * regular Seurat heatmap of top 5 marker genes for each cluster only use 100 cells for each cluster

Define genes of interest and do heatmaps

Because this is neuro dataset, I usually will first do the heatmap of 25 marker genes used by the Allen Brain SMARTseq study. I also varied color schemes and use different subset of cells. * I selected four genes to plot, Piezo1, Piezo2, Prlr, Calca. Generally expression level for these genes are low

Find markers for specific cluster in two samples for cell type identification

There are multiple violin plots, just to vary plotting parameters to achieve aesthetics

Identifying cell type

Option 1: SingleR package with built-in reference

I use a collection of mouse bulk RNA-seq data sets obtained from celldex package (Benayoun et al. 2019). A variety of cell types are available, mostly from blood but also covering several other tissues. This identifies marker genes from the reference and uses them to compute assignment scores (based on the Spearman correlation across markers) for each cell in the test dataset against each label in the reference. The label with the highest score is the assigned to the test cell, possibly with further fine-tuning to resolve closely related labels.

This reference consists of a collection of mouse bulk RNA-seq data sets downloaded from the gene expression omnibus Benayoun et al. 2019. A variety of cell types are available, again mostly from blood but also covering several other tissues.

This autonomic annotation is just for crude reference only. Will not tell you DRG cell types.

SingleR classified cell type and numbers:

Option 2: manual annotation

Plotting Astrocyte Markers

Plotting Microglia Markers

Plotting Endothelial Markers

#### Plotting Oligodendrocyte Markers

Plotting Glutamatergic Neuron Markers

#### Plotting Gabaergic Neuron Markers #### Plotting Oligodendrocyte Precursor Markers

Rename cluster based on the SingleR results and manual annotation

(This section is skipped because I need the knowledge of DRG neuron types and markers, which takes longer time-but will pick it up later)

Option 3: manual annotation

Refer to Tasic et al, Nature 2018, [marker list] https://github.com/AllenInstitute/tasic2018analysis/blob/master/RNA-seq%20Analysis/markers.R

Differential expression Analysis for groups

Now we do a DEG analysis for all genes between the two groups

Differential expression Analysis for cluster between groups

Next we will do DEG for each cluster betweent the two groups, and write the results to a csv file [SQ NOTE, because no cell type annotation, we will skip this step for now]

session info

Print session information

## R version 4.2.1 (2022-06-23 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19045)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## attached base packages:
## [1] grid      stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] celldex_1.6.0               SingleR_1.10.0             
##  [3] patchwork_1.1.1             circlize_0.4.15            
##  [5] RColorBrewer_1.1-3          ComplexHeatmap_2.12.0      
##  [7] SingleCellExperiment_1.18.0 SummarizedExperiment_1.26.1
##  [9] Biobase_2.56.0              GenomicRanges_1.48.0       
## [11] GenomeInfoDb_1.32.2         IRanges_2.30.0             
## [13] S4Vectors_0.34.0            BiocGenerics_0.42.0        
## [15] MatrixGenerics_1.8.1        matrixStats_0.62.0         
## [17] rstatix_0.7.0               ggplot2_3.3.6              
## [19] cowplot_1.1.1               dplyr_1.0.9                
## [21] sp_1.5-0                    SeuratObject_4.1.0         
## [23] Seurat_4.1.1               
## 
## loaded via a namespace (and not attached):
##   [1] utf8_1.2.2                    reticulate_1.25              
##   [3] tidyselect_1.1.2              RSQLite_2.2.14               
##   [5] AnnotationDbi_1.58.0          htmlwidgets_1.5.4            
##   [7] BiocParallel_1.30.3           Rtsne_0.16                   
##   [9] munsell_0.5.0                 ScaledMatrix_1.4.0           
##  [11] codetools_0.2-18              ica_1.0-3                    
##  [13] future_1.27.0                 miniUI_0.1.1.1               
##  [15] withr_2.5.0                   spatstat.random_2.2-0        
##  [17] colorspace_2.0-3              progressr_0.10.1             
##  [19] filelock_1.0.2                highr_0.9                    
##  [21] knitr_1.39                    rstudioapi_0.13              
##  [23] ROCR_1.0-11                   tensor_1.5                   
##  [25] listenv_0.8.0                 labeling_0.4.2               
##  [27] GenomeInfoDbData_1.2.8        polyclip_1.10-0              
##  [29] pheatmap_1.0.12               bit64_4.0.5                  
##  [31] farver_2.1.1                  parallelly_1.32.1            
##  [33] vctrs_0.4.1                   generics_0.1.3               
##  [35] xfun_0.31                     BiocFileCache_2.4.0          
##  [37] R6_2.5.1                      doParallel_1.0.17            
##  [39] clue_0.3-61                   rsvd_1.0.5                   
##  [41] bitops_1.0-7                  spatstat.utils_2.3-1         
##  [43] cachem_1.0.6                  DelayedArray_0.22.0          
##  [45] assertthat_0.2.1              promises_1.2.0.1             
##  [47] scales_1.2.0                  rgeos_0.5-9                  
##  [49] gtable_0.3.0                  beachmat_2.12.0              
##  [51] globals_0.15.1                goftest_1.2-3                
##  [53] rlang_1.0.3                   GlobalOptions_0.1.2          
##  [55] splines_4.2.1                 lazyeval_0.2.2               
##  [57] spatstat.geom_2.4-0           broom_1.0.0                  
##  [59] BiocManager_1.30.18           yaml_2.3.5                   
##  [61] reshape2_1.4.4                abind_1.4-5                  
##  [63] backports_1.4.1               httpuv_1.6.5                 
##  [65] tools_4.2.1                   ellipsis_0.3.2               
##  [67] spatstat.core_2.4-4           jquerylib_0.1.4              
##  [69] ggridges_0.5.3                Rcpp_1.0.8.3                 
##  [71] plyr_1.8.7                    sparseMatrixStats_1.8.0      
##  [73] zlibbioc_1.42.0               purrr_0.3.4                  
##  [75] RCurl_1.98-1.7                rpart_4.1.16                 
##  [77] deldir_1.0-6                  pbapply_1.5-0                
##  [79] GetoptLong_1.0.5              zoo_1.8-10                   
##  [81] ggrepel_0.9.1                 cluster_2.1.3                
##  [83] magrittr_2.0.3                data.table_1.14.2            
##  [85] RSpectra_0.16-1               scattermore_0.8              
##  [87] lmtest_0.9-40                 RANN_2.6.1                   
##  [89] fitdistrplus_1.1-8            mime_0.12                    
##  [91] evaluate_0.16                 xtable_1.8-4                 
##  [93] gridExtra_2.3                 shape_1.4.6                  
##  [95] compiler_4.2.1                tibble_3.1.7                 
##  [97] KernSmooth_2.23-20            crayon_1.5.1                 
##  [99] htmltools_0.5.2               mgcv_1.8-40                  
## [101] later_1.3.0                   tidyr_1.2.0                  
## [103] DBI_1.1.3                     ExperimentHub_2.4.0          
## [105] dbplyr_2.2.1                  rappdirs_0.3.3               
## [107] MASS_7.3-57                   Matrix_1.4-1                 
## [109] car_3.1-0                     cli_3.3.0                    
## [111] parallel_4.2.1                igraph_1.3.2                 
## [113] pkgconfig_2.0.3               plotly_4.10.0                
## [115] spatstat.sparse_2.1-1         foreach_1.5.2                
## [117] bslib_0.4.0                   XVector_0.36.0               
## [119] stringr_1.4.0                 digest_0.6.29                
## [121] sctransform_0.3.3             RcppAnnoy_0.0.19             
## [123] Biostrings_2.64.0             spatstat.data_2.2-0          
## [125] rmarkdown_2.14                leiden_0.4.2                 
## [127] uwot_0.1.11                   DelayedMatrixStats_1.18.0    
## [129] curl_4.3.2                    shiny_1.7.2                  
## [131] rjson_0.2.21                  lifecycle_1.0.1              
## [133] nlme_3.1-157                  jsonlite_1.8.0               
## [135] carData_3.0-5                 BiocNeighbors_1.14.0         
## [137] viridisLite_0.4.0             limma_3.52.2                 
## [139] fansi_1.0.3                   pillar_1.8.0                 
## [141] lattice_0.20-45               KEGGREST_1.36.3              
## [143] fastmap_1.1.0                 httr_1.4.3                   
## [145] survival_3.3-1                interactiveDisplayBase_1.34.0
## [147] glue_1.6.2                    png_0.1-7                    
## [149] iterators_1.0.14              BiocVersion_3.15.2           
## [151] bit_4.0.4                     stringi_1.7.6                
## [153] sass_0.4.2                    blob_1.2.3                   
## [155] AnnotationHub_3.4.0           BiocSingular_1.12.0          
## [157] memoise_2.0.1                 irlba_2.3.5                  
## [159] future.apply_1.9.0

Two_DRG_sample_scRNAseq_analysis

Shenfeng Qiu

2023-02-08

Sample preparation information

Import libaries for analysis

Import two samples - control (CTL) and stress treated (TRT)

Quality Control

Joint filtering effects

Selecting Data Subset

Re-assess quality metrics

Apply `sctransform` normalization

Integration of two samples

Clustering

Clustering quality control

Find all markers for each cluster in the two samples for cell type identification

Plotting top markers for each cluster

Violin plot for selected gene markers

Define genes of interest and do heatmaps

Find markers for specific cluster in two samples for cell type identification

Identifying cell type

Option 1: SingleR package with built-in reference

Option 2: manual annotation

Plotting Astrocyte Markers

Plotting Microglia Markers

Plotting Endothelial Markers

Plotting Glutamatergic Neuron Markers

Rename cluster based on the SingleR results and manual annotation

Option 3: manual annotation

Differential expression Analysis for groups

Differential expression Analysis for cluster between groups

session info

Two_DRG_sample_scRNAseq_analysis

Shenfeng Qiu

2023-02-08

Sample preparation information

Import libaries for analysis

Import two samples - control (CTL) and stress treated (TRT)

Quality Control

Joint filtering effects

Selecting Data Subset

Re-assess quality metrics

Apply sctransform normalization

Integration of two samples

Clustering

Clustering quality control

Find all markers for each cluster in the two samples for cell type identification

Plotting top markers for each cluster

Violin plot for selected gene markers

Define genes of interest and do heatmaps

Find markers for specific cluster in two samples for cell type identification

Identifying cell type

Option 1: SingleR package with built-in reference

Option 2: manual annotation

Plotting Astrocyte Markers

Plotting Microglia Markers

Plotting Endothelial Markers

Plotting Glutamatergic Neuron Markers

Rename cluster based on the SingleR results and manual annotation

Option 3: manual annotation

Differential expression Analysis for groups

Differential expression Analysis for cluster between groups

session info

Apply `sctransform` normalization