This experiment is in collaboration with Dr. Frank Porreca’s lab,
purpose is to see how chronic stress treatment alters sensitivity to
pain, and how it affects transcriptomic expression at single cell level
of dorsal root ganglia cells.
** Key Experimental Conditions:
1. Harrison did the chronic stress in 5 control (CTL) and 5 treated
(TRT) male mice
2. Dissected most DRGs from 5 mice and pull together for enzymatic
digestion, which uses their cell culture digestion protocol
3. Final cell density before loading to iX controller: CTL,
6x105 cells per milliliter; TRL, 6x105 cells per
milliliter
4. Dispersed cells go through 10X Genomics Chromium pipeline, with Next
GEM 3’ chemistry kit. cDNA library prepared with dual index TT kit.
Sequence vendor Novogene, run cycle 151+8+8+151
paired reads
Here is mice information for the experiment:
Here is a
typical image of dissociated cells, with large and small cells
characteristic of DRG:
The pipeline uses Seurat V4, also can be achieved with Scanpy
# setwd("D:/Bioinformatics Projects/DRG")
library(Seurat)
library(dplyr)
library(cowplot)
library(ggplot2)
library(rstatix)
library(SingleCellExperiment)
library(ComplexHeatmap)
library(RColorBrewer)
library(circlize)
library(patchwork)
To minimize change of coding, I keep the existing variable names from previous scripts as much as possile. Copy the ./slim_counts/ files to the m10 (CTL) and m11 (TRT) folder under ‘./slim_counts’.
Quality assessment before filtering
Quality assessment after filtering
Filter data based on these criteria: nFeature_RNA < 7500& nFeature_RNA>750 & nCount_RNA > 1000 and then, added the filtering criteria of “percent.mt < 15” After subseting, only keeping those genes expressed in more than 20 cells.
After subsetting and filtering, the dimension of pooled data (both CTL and TRT) is:19407, 41972
Quality metrics after filtered counts
sctransform normalizationThis steps can replace NormalizeData(),
ScaleData(), and FindVariableFeatures(). The
results of sctransform are stored in the “SCT” assay. It is
assumed to reveal sharper biological distinctions compared to the
standard Seurat workflow.
Since it is a normalization step, we have to do
separately for two different samples (that’s why split Seurat objects
here), then only integrate the sample expression data in the next step,
integration analysis, to remove batch effect.
Select the most variable features to use for integration… we
will use 3000 features and IntegratData
Run the standard workflow for linear and non-linear reduction, visualization and clustering
We use a resolution value of 0.1 for
cluster identification, which seems appropriate for this sample size
~13K cells
This step gives us some idea about how is the distribution of the number of genes, number of UMIs, and percentage of mitochondrial genes in each cluster. Normally, we expect to see similar distribution of no. of genes (nFeature_RNA) and no. of UMIs (nCount_RNA).
As for the percent.mt (percentage of mitochondrial genes per cell), it can be a reference to check if those high intensity clusters might be having poor quality cells (if so, we can try to remove in the next step or adjust the metrics in the previous filtering step) or it might be due to the differences biologically.
Determine metrics to plot present in seurat_integrated@meta.data
NOTE from SQ: for certain feature if not found in ‘integrated’ assay (may only have 2000 highly variable features), you need to switch to “RNA” or “SCT” assay using DefaultAssay(seurat_integrated) <- “RNA”
Write all gene markers to a .csv file on local drive
Setting up plotting parameters and plot the
top marker for each cluster
For example, a certain gene marker for the DRG cells > here i am
plotting several: * using split violin gene for gene Bsg *
regular split violin for desired clusters, e.g. c(0, 1, 5) *
feature plot for the top one gene from
each cluster, for the first 9 clusters * a blend feature plot of two
selected top genes * regular Seurat heatmap of top 5 marker genes for
each cluster * regular Seurat heatmap of top 5 marker genes for each
cluster only use 100 cells for each cluster
Because this is neuro dataset, I usually will first do the heatmap of
25 marker genes used by the Allen Brain SMARTseq study. I also varied
color schemes and use different subset of cells. * I selected four genes
to plot, Piezo1, Piezo2, Prlr, Calca. Generally expression
level for these genes are low
There are multiple violin plots, just to vary plotting parameters to
achieve aesthetics
I use a collection of mouse bulk RNA-seq data sets obtained from celldex package (Benayoun et al. 2019). A variety of cell types are available, mostly from blood but also covering several other tissues. This identifies marker genes from the reference and uses them to compute assignment scores (based on the Spearman correlation across markers) for each cell in the test dataset against each label in the reference. The label with the highest score is the assigned to the test cell, possibly with further fine-tuning to resolve closely related labels.
This reference consists of a collection of mouse bulk RNA-seq data sets downloaded from the gene expression omnibus Benayoun et al. 2019. A variety of cell types are available, again mostly from blood but also covering several other tissues.
This autonomic annotation is just for crude reference only. Will not tell you DRG cell types.
SingleR classified cell type and numbers:
#### Plotting Oligodendrocyte Markers
#### Plotting Gabaergic Neuron Markers
#### Plotting Oligodendrocyte Precursor Markers
(This section is skipped because I need the knowledge of DRG neuron types and markers, which takes longer time-but will pick it up later)
Refer to Tasic et al, Nature 2018, [marker list] https://github.com/AllenInstitute/tasic2018analysis/blob/master/RNA-seq%20Analysis/markers.R
Now we do a DEG analysis for all genes between the two groups
Next we will do DEG for each cluster betweent the two groups, and write the results to a csv file [SQ NOTE, because no cell type annotation, we will skip this step for now]
Print session information
## R version 4.2.1 (2022-06-23 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19045)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.utf8
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## attached base packages:
## [1] grid stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] celldex_1.6.0 SingleR_1.10.0
## [3] patchwork_1.1.1 circlize_0.4.15
## [5] RColorBrewer_1.1-3 ComplexHeatmap_2.12.0
## [7] SingleCellExperiment_1.18.0 SummarizedExperiment_1.26.1
## [9] Biobase_2.56.0 GenomicRanges_1.48.0
## [11] GenomeInfoDb_1.32.2 IRanges_2.30.0
## [13] S4Vectors_0.34.0 BiocGenerics_0.42.0
## [15] MatrixGenerics_1.8.1 matrixStats_0.62.0
## [17] rstatix_0.7.0 ggplot2_3.3.6
## [19] cowplot_1.1.1 dplyr_1.0.9
## [21] sp_1.5-0 SeuratObject_4.1.0
## [23] Seurat_4.1.1
##
## loaded via a namespace (and not attached):
## [1] utf8_1.2.2 reticulate_1.25
## [3] tidyselect_1.1.2 RSQLite_2.2.14
## [5] AnnotationDbi_1.58.0 htmlwidgets_1.5.4
## [7] BiocParallel_1.30.3 Rtsne_0.16
## [9] munsell_0.5.0 ScaledMatrix_1.4.0
## [11] codetools_0.2-18 ica_1.0-3
## [13] future_1.27.0 miniUI_0.1.1.1
## [15] withr_2.5.0 spatstat.random_2.2-0
## [17] colorspace_2.0-3 progressr_0.10.1
## [19] filelock_1.0.2 highr_0.9
## [21] knitr_1.39 rstudioapi_0.13
## [23] ROCR_1.0-11 tensor_1.5
## [25] listenv_0.8.0 labeling_0.4.2
## [27] GenomeInfoDbData_1.2.8 polyclip_1.10-0
## [29] pheatmap_1.0.12 bit64_4.0.5
## [31] farver_2.1.1 parallelly_1.32.1
## [33] vctrs_0.4.1 generics_0.1.3
## [35] xfun_0.31 BiocFileCache_2.4.0
## [37] R6_2.5.1 doParallel_1.0.17
## [39] clue_0.3-61 rsvd_1.0.5
## [41] bitops_1.0-7 spatstat.utils_2.3-1
## [43] cachem_1.0.6 DelayedArray_0.22.0
## [45] assertthat_0.2.1 promises_1.2.0.1
## [47] scales_1.2.0 rgeos_0.5-9
## [49] gtable_0.3.0 beachmat_2.12.0
## [51] globals_0.15.1 goftest_1.2-3
## [53] rlang_1.0.3 GlobalOptions_0.1.2
## [55] splines_4.2.1 lazyeval_0.2.2
## [57] spatstat.geom_2.4-0 broom_1.0.0
## [59] BiocManager_1.30.18 yaml_2.3.5
## [61] reshape2_1.4.4 abind_1.4-5
## [63] backports_1.4.1 httpuv_1.6.5
## [65] tools_4.2.1 ellipsis_0.3.2
## [67] spatstat.core_2.4-4 jquerylib_0.1.4
## [69] ggridges_0.5.3 Rcpp_1.0.8.3
## [71] plyr_1.8.7 sparseMatrixStats_1.8.0
## [73] zlibbioc_1.42.0 purrr_0.3.4
## [75] RCurl_1.98-1.7 rpart_4.1.16
## [77] deldir_1.0-6 pbapply_1.5-0
## [79] GetoptLong_1.0.5 zoo_1.8-10
## [81] ggrepel_0.9.1 cluster_2.1.3
## [83] magrittr_2.0.3 data.table_1.14.2
## [85] RSpectra_0.16-1 scattermore_0.8
## [87] lmtest_0.9-40 RANN_2.6.1
## [89] fitdistrplus_1.1-8 mime_0.12
## [91] evaluate_0.16 xtable_1.8-4
## [93] gridExtra_2.3 shape_1.4.6
## [95] compiler_4.2.1 tibble_3.1.7
## [97] KernSmooth_2.23-20 crayon_1.5.1
## [99] htmltools_0.5.2 mgcv_1.8-40
## [101] later_1.3.0 tidyr_1.2.0
## [103] DBI_1.1.3 ExperimentHub_2.4.0
## [105] dbplyr_2.2.1 rappdirs_0.3.3
## [107] MASS_7.3-57 Matrix_1.4-1
## [109] car_3.1-0 cli_3.3.0
## [111] parallel_4.2.1 igraph_1.3.2
## [113] pkgconfig_2.0.3 plotly_4.10.0
## [115] spatstat.sparse_2.1-1 foreach_1.5.2
## [117] bslib_0.4.0 XVector_0.36.0
## [119] stringr_1.4.0 digest_0.6.29
## [121] sctransform_0.3.3 RcppAnnoy_0.0.19
## [123] Biostrings_2.64.0 spatstat.data_2.2-0
## [125] rmarkdown_2.14 leiden_0.4.2
## [127] uwot_0.1.11 DelayedMatrixStats_1.18.0
## [129] curl_4.3.2 shiny_1.7.2
## [131] rjson_0.2.21 lifecycle_1.0.1
## [133] nlme_3.1-157 jsonlite_1.8.0
## [135] carData_3.0-5 BiocNeighbors_1.14.0
## [137] viridisLite_0.4.0 limma_3.52.2
## [139] fansi_1.0.3 pillar_1.8.0
## [141] lattice_0.20-45 KEGGREST_1.36.3
## [143] fastmap_1.1.0 httr_1.4.3
## [145] survival_3.3-1 interactiveDisplayBase_1.34.0
## [147] glue_1.6.2 png_0.1-7
## [149] iterators_1.0.14 BiocVersion_3.15.2
## [151] bit_4.0.4 stringi_1.7.6
## [153] sass_0.4.2 blob_1.2.3
## [155] AnnotationHub_3.4.0 BiocSingular_1.12.0
## [157] memoise_2.0.1 irlba_2.3.5
## [159] future.apply_1.9.0