Grubman et al. 2019 QC Workbook/Supplemental Methods
Author
Nicole Flack, DVM, PhD
Modified
April 26, 2024
Background
Original Publication of Dataset
Grubman et al. (2019)A single-cell atlas of entorhinal cortex from individuals with Alzheimer’s disease reveals cell-type-specific gene expression regulation,Nature Neuroscience
Exploratory case-control comparison to detect cell-type-specific gene expression patterns in human entorhinal cortex, a brain region that shows tau and amyloid pathology in the early stages of Alzheimer’s disease (AD). Associating transcriptional changes in specific cell subpopulations with disease.
Context of Current Project
Expression of neuropeptides (NPs) has been shown to be perturbed in AD (Li and Larsen 2023).
Specifically:
Cells (mostly GABAergic neurons) expressing a large number and variety of NPs (HNP cells) are lost in AD
Expression of certain AD-associated NPs (ADNPs) is reduced in disease
HNP cells coexpress ADNPs and genes associated with tau misfolding; have molecular signatures of protein misfolding
Working hypothesis: higher energy demand in HNP cells leads to greater metabolic stress during aging; coexpression of ADNPs contributes to protein misfolding cascade and selective vulnerability in early AD
Gaps in Knowledge
Functional characteristics of HNP neurons
Relationship between NP coexpression and other pathological features of AD
Spatiotemporal NP dynamics in course of disease severity
Preprint: Li, Flack, and Larsen (2023), Multifaceted impact of specialized neuropeptide-intensive neurons on the selective vulnerability in Alzheimer’s disease (bioRxiv)
Goals of Analysis
Interrogate function profiles of HNP neurons, NP expression dynamics in AD, ADNP trajectories in disease course, and the neuroanatomical distribution of HNP neurons.
Library and Raw Data Information
Samples
Biobanked human entorhinal cortex (EC) from age-matched individuals diagnosed with Alzheimer’s disease (AD; n=3 samples, 2 individuals pooled per sample) or control (ct; n = 3 samples, also pooled) based on tau & amyloid pathology.
Library prep: Chromium Single Cell 3′ Library & Gel Bead Kit v2 (10x Genomics, PN-120237)
Sequencing
Illumina NextSeq 500
Raw Data
Raw data were processed with CellRanger. The resulting expression matrix (cellranger count) contained 33,694 features and 14,876 cells.
Genes with at least 2 transcripts in at least 10 cells were retained. Counts for one hundred postmortem interval-associated genes identified in cerebral cortex by Zhu et al. (2017) were removed. The resulting matrix contained 24,165 features across 14,853 cells and was used as input for this analysis.
Load Counts
Code
# load into merged Seurat object and add metadataseurat_big <-Read10X_Multi_Directory("data_grubman",default_10X_path =FALSE,secondary_path ="",merge = T,) %>%# read in filters# only features found in at least 3 cells# only nuclei with at least 200 featuresCreateSeuratObject(.,min.cells =3,min.features =200) %>%# Disease GroupAddMetaData(., str_extract(.@meta.data$orig.ident, "^(AD|ct)"),col.name ="type") %>%# Hemoglobin gene contentAddMetaData(., PercentageFeatureSet(., "^HB[^(P)]"),col.name ="percent_hb") %>%# MT, ribo genes; Add_Cell_QC_Metrics(seurat_object = ., species ="human")Project(seurat_big) <-"AD_TCX"seurat_big@meta.data$type <-factor(seurat_big@meta.data$type,levels =c("ct", "AD"))seurat_big
An object of class Seurat
24165 features across 14853 samples within 1 assay
Active assay: RNA (24165 features, 0 variable features)
2 layers present: counts, data
Initial QC
Preliminary Plots
Nuclei Per Sample/Group
Number of nuclei per sample varied between ~1,200 and 3,500. Nuclei per group (Alzheimer’s and control) were similar despite larger variation in nuclei per sample.
Code
# per sampleseurat_big@meta.data |> dplyr::rename(Group = type) |>ggplot(aes(x = orig.ident, fill = Group)) +geom_bar(position ="stack") +ggtitle("Nuclei Per Sample") +labs(x ="Sample", y ="Count") +theme_bw() + ggpubr::labs_pubr() +scale_fill_paletteer_d("ggsci::default_nejm")
Code
# Stacked samples to show per groupseurat_big@meta.data |> dplyr::rename(Group = type,Sample = orig.ident) |>ggplot(aes(x = Group, fill = Sample)) +geom_bar(position ="stack",) +ggtitle("Nuclei Per Group") +labs(x ="Group", y ="Count") +theme_bw() +labs_pubr() +scale_fill_paletteer_d("ggsci::default_nejm")
Features Per Nucleus
Minimum feature count of 200 set at data read in
Code
# histogramsseurat_big@meta.data |>ggplot(aes(x = nFeature_RNA)) +geom_histogram(binwidth =100) +facet_wrap(~ orig.ident) +ggtitle("NGenes per Nucleus") +labs(x ="NGenes", y ="Count") +theme_bw() +labs_pubr()
High percentages of mitochondrial genes is less expected for nuclei; indicator of incomplete removal of cytoplasmic material. Kit was Nuclei EZ Prep (Sigma, NUC101).
Code
VlnPlot(seurat_big, features =c("percent_mito", "percent_ribo"))
Code
VlnPlot(seurat_big, features =c("percent_ieg", "percent_hb"))
Novelty/Complexity
Code
QC_Plots_Complexity(seurat_big)
Scatter Plots
Nuclei with high percentage of mitochondrial transcripts are not the same nuclei on the high end of genes/UMIs.
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 14268
Number of edges: 449460
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9520
Number of communities: 9
Elapsed time: 1 seconds
Code
seurat_filter
An object of class Seurat
24165 features across 14268 samples within 1 assay
Active assay: RNA (24165 features, 2000 variable features)
3 layers present: counts, data, scale.data
3 dimensional reductions calculated: pca, umap, tsne
Recheck Plots
Code
# combo QCQC_Plots_Combined_Vln(seurat_filter)
Code
# nuclei per groupseurat_filter@meta.data |> dplyr::rename(Group = type,Sample = orig.ident) |>ggplot(aes(x = Group, fill = Sample)) +geom_bar(position ="stack",) +ggtitle("Nuclei Per Group") +labs(x ="Group", y ="Count") +theme_bw() +labs_pubr() +scale_fill_paletteer_d("ggsci::default_nejm")
Code
# UMI vs. geneQC_Plot_UMIvsGene(seurat_filter,group.by ="orig.ident",shuffle_seed = T)
Grubman, Alexandra, Gabriel Chew, John F. Ouyang, Guizhi Sun, Xin Yi Choo, Catriona McLean, Rebecca K. Simmons, et al. 2019. “A Single-Cell Atlas of Entorhinal Cortex from Individuals with Alzheimer’s Disease Reveals Cell-Type-Specific Gene Expression Regulation.”Nature Neuroscience 22 (12): 2087–97. https://doi.org/10.1038/s41593-019-0539-4.
Habib, Naomi, Inbal Avraham-Davidi, Anindita Basu, Tyler Burks, Karthik Shekhar, Matan Hofree, Sourav R Choudhury, et al. 2017. “Massively Parallel Single-Nucleus RNA-Seq with DroNc-Seq.”Nature Methods 14 (10): 955–58. https://doi.org/10.1038/nmeth.4407.
Li, Manci, Nicole Flack, and Peter A Larsen. 2023. “Multifaceted Impact of Specialized Neuropeptide-Intensive Neurons on the Selective Vulnerability in Alzheimer’s Disease.”http://dx.doi.org/10.1101/2023.11.13.566905.
Li, Manci, and Peter A. Larsen. 2023. “Single-Cell Sequencing of Entorhinal Cortex Reveals Widespread Disruption of Neuropeptide Networks in Alzheimer’s Disease.”Alzheimer’s & Dementia 19 (8): 3575–92. https://doi.org/10.1002/alz.12979.
Zhu, Yizhang, Likun Wang, Yuxin Yin, and Ence Yang. 2017. “Systematic Analysis of Gene Expression Patterns Associated with Postmortem Interval in Human Tissues.”Scientific Reports 7 (1). https://doi.org/10.1038/s41598-017-05882-0.
Source Code
---title: "Multifaceted impact of specialized neuropeptide-intensive neurons on selective vulnerability in Alzheimer’s disease"subtitle: "Grubman et al. 2019 QC Workbook/Supplemental Methods"author: "Nicole Flack, DVM, PhD"date-modified: todayengine: knitrtheme: darklyeditor_options: chunk_output_type: inlineexecute: eval: true cache: trueformat: html: toc: true self-contained: true toc-depth: 5 toc-title: Contents smooth-scroll: true code-overflow: wrap code-fold: true df-print: paged code-tools: truebibliography: references.bib---```{r setup, include=FALSE}library(stringr)library(ggpubr)library(paletteer)```## Background### Original Publication of Dataset@grubman2019 **A single-cell atlas of entorhinal cortex from individuals with Alzheimer’s disease reveals cell-type-specific gene expression regulation,** *Nature Neuroscience*Exploratory case-control comparison to detect cell-type-specific gene expression patterns in human entorhinal cortex, a brain region that shows tau and amyloid pathology in the early stages of Alzheimer's disease (AD). Associating transcriptional changes in specific cell subpopulations with disease.### Context of Current Project Expression of neuropeptides (NPs) has been shown to be perturbed in AD [@li2023a].Specifically:- Cells (mostly GABAergic neurons) expressing a large number and variety of NPs (HNP cells) are lost in AD- Expression of certain AD-associated NPs (ADNPs) is reduced in disease- HNP cells coexpress ADNPs and genes associated with tau misfolding; have molecular signatures of protein misfoldingWorking hypothesis: higher energy demand in HNP cells leads to greater metabolic stress during aging; coexpression of ADNPs contributes to protein misfolding cascade and selective vulnerability in early AD### Gaps in Knowledge- Functional characteristics of HNP neurons- Relationship between NP coexpression and other pathological features of AD- Spatiotemporal NP dynamics in course of disease severityPreprint: @li2023, **Multifaceted impact of specialized neuropeptide-intensive neurons on the selective vulnerability in Alzheimer's disease** ([bioRxiv](https://www.biorxiv.org/content/10.1101/2023.11.13.566905v2))## Goals of AnalysisInterrogate function profiles of HNP neurons, NP expression dynamics in AD, ADNP trajectories in disease course, and the neuroanatomical distribution of HNP neurons.## Library and Raw Data Information### SamplesBiobanked human entorhinal cortex (EC) from age-matched individuals diagnosed with Alzheimer's disease (AD; n=3 samples, 2 individuals pooled per sample) or control (ct; n = 3 samples, also pooled) based on tau & amyloid pathology.### LibrariesDroNc-Seq (Drop-seq + snRNA-seq), 10x Chromium platformMethod reference: @habib2017Library prep: Chromium Single Cell 3′ Library & Gel Bead Kit v2 (10x Genomics, PN-120237)### SequencingIllumina NextSeq 500### Raw DataRaw data were processed with CellRanger. The resulting expression matrix (`cellranger count`) contained 33,694 features and 14,876 cells.Genes with at least 2 transcripts in at least 10 cells were retained. Counts for one hundred postmortem interval-associated genes identified in cerebral cortex by @zhu2017 were removed. The resulting matrix contained 24,165 features across 14,853 cells and was used as input for this analysis.## Load Counts```{r}#| message: false#| warning: false# load into merged Seurat object and add metadataseurat_big <-Read10X_Multi_Directory("data_grubman",default_10X_path =FALSE,secondary_path ="",merge = T,) %>%# read in filters# only features found in at least 3 cells# only nuclei with at least 200 featuresCreateSeuratObject(.,min.cells =3,min.features =200) %>%# Disease GroupAddMetaData(., str_extract(.@meta.data$orig.ident, "^(AD|ct)"),col.name ="type") %>%# Hemoglobin gene contentAddMetaData(., PercentageFeatureSet(., "^HB[^(P)]"),col.name ="percent_hb") %>%# MT, ribo genes; Add_Cell_QC_Metrics(seurat_object = ., species ="human")Project(seurat_big) <-"AD_TCX"seurat_big@meta.data$type <-factor(seurat_big@meta.data$type,levels =c("ct", "AD"))seurat_big```## Initial QC### Preliminary Plots#### Nuclei Per Sample/GroupNumber of nuclei per sample varied between \~1,200 and 3,500. Nuclei per group (Alzheimer's and control) were similar despite larger variation in nuclei per sample.```{r}# per sampleseurat_big@meta.data |> dplyr::rename(Group = type) |>ggplot(aes(x = orig.ident, fill = Group)) +geom_bar(position ="stack") +ggtitle("Nuclei Per Sample") +labs(x ="Sample", y ="Count") +theme_bw() + ggpubr::labs_pubr() +scale_fill_paletteer_d("ggsci::default_nejm")# Stacked samples to show per groupseurat_big@meta.data |> dplyr::rename(Group = type,Sample = orig.ident) |>ggplot(aes(x = Group, fill = Sample)) +geom_bar(position ="stack",) +ggtitle("Nuclei Per Group") +labs(x ="Group", y ="Count") +theme_bw() +labs_pubr() +scale_fill_paletteer_d("ggsci::default_nejm")```#### Features Per NucleusMinimum feature count of 200 set at data read in```{r}# histogramsseurat_big@meta.data |>ggplot(aes(x = nFeature_RNA)) +geom_histogram(binwidth =100) +facet_wrap(~ orig.ident) +ggtitle("NGenes per Nucleus") +labs(x ="NGenes", y ="Count") +theme_bw() +labs_pubr()# violin plotsQC_Plots_Genes(seurat_big, group.by ="orig.ident")```#### UMIs Per Nucleus```{r}QC_Plots_UMIs(seurat_big, group.by ="orig.ident")quantile(seurat_big@meta.data$nCount_RNA,probs =seq(0,1,0.1))```#### Percent Mitochondrial, Ribosomal, Immediate-Early, HemoglobinHigh percentages of mitochondrial genes is less expected for nuclei; indicator of incomplete removal of cytoplasmic material. Kit was Nuclei EZ Prep (Sigma, NUC101).```{r}VlnPlot(seurat_big, features =c("percent_mito", "percent_ribo"))VlnPlot(seurat_big, features =c("percent_ieg", "percent_hb"))```#### Novelty/Complexity```{r}QC_Plots_Complexity(seurat_big)```#### Scatter PlotsNuclei with high percentage of mitochondrial transcripts are not the same nuclei on the high end of genes/UMIs.```{r}QC_Plot_GenevsFeature(seurat_big, feature1 ="percent_mito")QC_Plot_UMIvsFeature(seurat_big, feature1 ="percent_mito")QC_Plot_UMIvsGene(seurat_big)```### Filtering and NormalizationBased on plots, filter counts, then normalize, find variable features, scale, find neighbors, and cluster.- 200 < Features < 2500- UMIs < 20000- Percent mitochondrial < 5- Percent ribosomal < 10- Percent immediate early genes < 5```{r}#| message: false#| warning: false#| cache-lazy: falseall_genes <-rownames(seurat_big)seurat_filter <- seurat_big %>%subset(., subset = nFeature_RNA <2500& nFeature_RNA >200&# redundant in case read in option changed nCount_RNA <20000& percent_mito <5& percent_ribo <10& percent_ieg <5 ) %>%NormalizeData(.) %>%FindVariableFeatures(., selection.method ="vst",nfeatures =2000# x.low.cutoff = 0.0125, # x.high.cutoff = 3,# y.cutof = 0.5 ) %>%ScaleData(., features = all_genes) %>%RunPCA(., features =VariableFeatures(object = .))ElbowPlot(seurat_filter)seurat_filter <- seurat_filter %>%FindNeighbors(., dims =1:9) %>%FindClusters(., resolution =0.2) %>%RunUMAP(., dims =1:9, seed.use =100) %>%RunTSNE(., dims =1:9, seed.use =100)seurat_filter```#### Recheck Plots```{r}# combo QCQC_Plots_Combined_Vln(seurat_filter)# nuclei per groupseurat_filter@meta.data |> dplyr::rename(Group = type,Sample = orig.ident) |>ggplot(aes(x = Group, fill = Sample)) +geom_bar(position ="stack",) +ggtitle("Nuclei Per Group") +labs(x ="Group", y ="Count") +theme_bw() +labs_pubr() +scale_fill_paletteer_d("ggsci::default_nejm")# UMI vs. geneQC_Plot_UMIvsGene(seurat_filter,group.by ="orig.ident",shuffle_seed = T)```## Clustering Visualization### UMAP```{r}# default appearanceUMAPPlot(seurat_filter, label =TRUE)# split by sampleUMAPPlot(seurat_filter,label =TRUE,split.by ="orig.ident",ncol =3)# split by groupUMAPPlot(seurat_filter,label =TRUE,split.by ="type")```### tSNE```{r}# default appearanceTSNEPlot(seurat_filter, label =TRUE)# split by sampleTSNEPlot(seurat_filter,label =TRUE,split.by ="orig.ident",ncol =3)# split by groupTSNEPlot(seurat_filter,label =TRUE,split.by ="type")```## Marker Genes```{r}#| message: false#| warning: falsemarkers <-FindAllMarkers(seurat_filter)markers_top <- markers |>group_by(cluster) |>slice_head(n =10) markers_neuronal <-inner_join(markers_df_brain, markers_top, by =c("markers"="gene"),relationship ="many-to-many") |>group_by(cluster, cell) |>mutate(n =n()) |>ungroup() |>group_by(cell) |>arrange(desc(n)) |>slice_head(n=1)FeaturePlot(seurat_filter, features = markers_neuronal$markers,reduction ="umap")```