The purpose of this script is to integrate scRNA-seq data from all samples arising from the same location (thymus, lymph node), which can help match shared cell types and states across datasets, boost statistical power, and facilitate accurate comparative analysis across datasets. The integration procedure aims to return a single dimensional reduction that captures the shared sources of variance across multiple layers, so that cells in a similar biological state will cluster. The method returns a dimensional reduction which can be used for visualization and unsupervised clustering analysis.
library(Seurat)
library(tidyverse)
library(patchwork)
library(pheatmap)
library(RColorBrewer)
library(data.table)
library(knitr)
library(DoubletFinder)
library(presto)
setwd("C:/Users/edlarsen/Documents/240828_scRNAseq")
Input: A filtered Seurat object with doublets removed.
filtered_ln <- readRDS(file = "C:/Users/edlarsen/Documents/240828_scRNAseq/Integration/seurat_merged_LN_filtered.RData")
filtered_thym <- readRDS(file = "C:/Users/edlarsen/Documents/240828_scRNAseq/Integration/seurat_merged_THYM_filtered.RData")
filtered_ln[["RNA"]] <- split(filtered_ln[["RNA"]], f = filtered_ln$orig.ident) # lymph node
filtered_thym[["RNA"]] <- split(filtered_thym[["RNA"]], f = filtered_thym$orig.ident) # thymus
These parameters were selected based on recommendations in the Seurat documentation for performing integration: https://satijalab.org/seurat/articles/integration_introduction
# run standard anlaysis workflow
## Lymph node
integ_ln <- NormalizeData(filtered_ln)
integ_ln <- FindVariableFeatures(integ_ln)
integ_ln <- ScaleData(integ_ln)
integ_ln <- RunPCA(integ_ln)
# visualize
integ_ln <- FindNeighbors(integ_ln, dims = 1:30, reduction = "pca")
integ_ln <- FindClusters(integ_ln, resolution = 2)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 29438
## Number of edges: 1037221
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8371
## Number of communities: 47
## Elapsed time: 3 seconds
integ_ln <- RunUMAP(integ_ln, dims = 1:30, reduction = "pca")
DimPlot(integ_ln, reduction = "umap", group.by = c("orig.ident", "seurat_clusters")) + # orig.ident is the sample ID
plot_annotation(title = "Canine Lymph Node - Unintegrated", theme = theme(plot.title = element_text(hjust=0.5, size = 20)))
# run standard analysis workflow
## Thymus
integ_thym <- NormalizeData(filtered_thym)
integ_thym <- FindVariableFeatures(integ_thym)
integ_thym <- ScaleData(integ_thym)
integ_thym <- RunPCA(integ_thym)
# visualize
integ_thym <- FindNeighbors(integ_thym, dims = 1:30, reduction = "pca")
integ_thym <- FindClusters(integ_thym, resolution = 2)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 22654
## Number of edges: 847974
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8373
## Number of communities: 34
## Elapsed time: 2 seconds
integ_thym <- RunUMAP(integ_thym, dims = 1:30, reduction = "pca")
DimPlot(integ_thym, reduction = "umap", group.by = c("orig.ident", "seurat_clusters")) + # orig.ident is the sample ID
plot_annotation(title = "Canine Thymus - Unintegrated", theme = theme(plot.title = element_text(hjust=0.5, size = 20)))
Canonical correlation analysis (CCA) identifies shared sources of variation between samples, but only identifies the greatest sources of variation in the data if it is shared or conserved across samples. CCA-based integration therefore enables integrative analysis when experimental conditions or disease states introduce very strong expression shifts, or when integrating datasets across modalities and species. However, CCA-based integration may also lead to overcorrection, especially when a large proportion of cells are non-overlapping across datasets.
RPCA-based integration runs faster, and represents a more conservative approach where cells in different biological states are less likely to ‘align’ after integration. RPCA is recommended when a substantial fraction of cells in one dataset have no matching type in the other, datasets originate from the same platform (i.e., multiple lanes of 10x genomics), or there are a large number of datasets or cells to integrate.
#### Returns error: Error in UseMethod(generic = "Assays", object = object) : no applicable method for 'Assays' applied to an object of class "NULL"
integ_ln <- IntegrateLayers(object = integ_ln,
method = CCAIntegration,
orig.reduction = "pca",
new.reduction = "integrated.cca",
verbose = FALSE)
integ_ln[["RNA"]] <- JoinLayers(integ_ln[["RNA"]]) # rejoin layers after integration
integ_thym <- IntegrateLayers(object = integ_thym,
method = CCAIntegration,
orig.reduction = "pca",
new.reduction = "integrated.cca",
verbose = FALSE)
integ_thym[["RNA"]] <- JoinLayers(integ_thym[["RNA"]]) # rejoin layers after integration
## Lymph node
integ_ln <- FindNeighbors(integ_ln, reduction = "integrated.cca", dims = 1:30)
integ_ln <- FindClusters(integ_ln, resolution = 1)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 29438
## Number of edges: 1284334
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8905
## Number of communities: 32
## Elapsed time: 4 seconds
integ_ln <- RunUMAP(integ_ln, dims = 1:30, reduction = "integrated.cca")
DimPlot(integ_ln, reduction = "umap", group.by = c("orig.ident", "seurat_clusters")) +
plot_annotation(title = "Canine Lymph Node - Integrated", theme = theme(plot.title = element_text(hjust=0.5, size = 20)))
## Thymus
integ_thym <- FindNeighbors(integ_thym, reduction = "integrated.cca", dims = 1:30)
integ_thym <- FindClusters(integ_thym, resolution = 1)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 22654
## Number of edges: 1004358
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8660
## Number of communities: 22
## Elapsed time: 2 seconds
integ_thym <- RunUMAP(integ_thym, dims = 1:30, reduction = "integrated.cca")
DimPlot(integ_thym, reduction = "umap", group.by = c("orig.ident", "seurat_clusters")) +
plot_annotation(title = "Canine Thymus - Integrated", theme = theme(plot.title = element_text(hjust=0.5, size = 20)))
saveRDS(integ_ln, file="seurat_integrated_LN_filtered.RData")
saveRDS(integ_thym, file="seurat_integrated_THYM_filtered.RData")
Next step: Data normalization and clustering.
sessionInfo()
## R version 4.4.0 (2024-04-24 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 22631)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=English_United States.utf8
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## time zone: America/Denver
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] presto_1.0.0 Rcpp_1.0.13 DoubletFinder_2.0.4
## [4] knitr_1.49 data.table_1.16.4 RColorBrewer_1.1-3
## [7] pheatmap_1.0.12 patchwork_1.3.0 lubridate_1.9.4
## [10] forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4
## [13] purrr_1.0.2 readr_2.1.5 tidyr_1.3.1
## [16] tibble_3.2.1 ggplot2_3.5.1 tidyverse_2.0.0
## [19] Seurat_5.1.0 SeuratObject_5.0.2 sp_2.1-4
##
## loaded via a namespace (and not attached):
## [1] rstudioapi_0.17.1 jsonlite_1.8.9 magrittr_2.0.3
## [4] spatstat.utils_3.1-1 farver_2.1.2 rmarkdown_2.29
## [7] vctrs_0.6.5 ROCR_1.0-11 spatstat.explore_3.3-3
## [10] htmltools_0.5.8.1 sass_0.4.9 sctransform_0.4.1
## [13] parallelly_1.40.1 KernSmooth_2.23-22 bslib_0.8.0
## [16] htmlwidgets_1.6.4 ica_1.0-3 plyr_1.8.9
## [19] plotly_4.10.4 zoo_1.8-12 cachem_1.1.0
## [22] igraph_2.1.2 mime_0.12 lifecycle_1.0.4
## [25] pkgconfig_2.0.3 Matrix_1.7-0 R6_2.5.1
## [28] fastmap_1.2.0 fitdistrplus_1.2-2 future_1.34.0
## [31] shiny_1.10.0 digest_0.6.35 colorspace_2.1-1
## [34] tensor_1.5 RSpectra_0.16-2 irlba_2.3.5.1
## [37] labeling_0.4.3 progressr_0.15.1 spatstat.sparse_3.1-0
## [40] timechange_0.3.0 httr_1.4.7 polyclip_1.10-7
## [43] abind_1.4-8 compiler_4.4.0 withr_3.0.2
## [46] fastDummies_1.7.4 MASS_7.3-60.2 tools_4.4.0
## [49] lmtest_0.9-40 httpuv_1.6.15 future.apply_1.11.3
## [52] goftest_1.2-3 glue_1.7.0 nlme_3.1-164
## [55] promises_1.3.2 grid_4.4.0 Rtsne_0.17
## [58] cluster_2.1.6 reshape2_1.4.4 generics_0.1.3
## [61] gtable_0.3.6 spatstat.data_3.1-4 tzdb_0.4.0
## [64] hms_1.1.3 spatstat.geom_3.3-4 RcppAnnoy_0.0.22
## [67] ggrepel_0.9.6 RANN_2.6.2 pillar_1.10.1
## [70] spam_2.11-0 RcppHNSW_0.6.0 later_1.3.2
## [73] splines_4.4.0 lattice_0.22-6 survival_3.5-8
## [76] deldir_2.0-4 tidyselect_1.2.1 miniUI_0.1.1.1
## [79] pbapply_1.7-2 gridExtra_2.3 scattermore_1.2
## [82] xfun_0.49 matrixStats_1.4.1 stringi_1.8.4
## [85] lazyeval_0.2.2 yaml_2.3.10 evaluate_1.0.3
## [88] codetools_0.2-20 cli_3.6.2 uwot_0.2.2
## [91] xtable_1.8-4 reticulate_1.40.0 munsell_0.5.1
## [94] jquerylib_0.1.4 globals_0.16.3 spatstat.random_3.3-2
## [97] png_0.1-8 spatstat.univar_3.1-1 parallel_4.4.0
## [100] dotCall64_1.2 listenv_0.9.1 viridisLite_0.4.2
## [103] scales_1.3.0 ggridges_0.5.6 leiden_0.4.3.1
## [106] rlang_1.1.3 cowplot_1.1.3
citation()
## To cite R in publications use:
##
## R Core Team (2024). _R: A Language and Environment for Statistical
## Computing_. R Foundation for Statistical Computing, Vienna, Austria.
## <https://www.R-project.org/>.
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {R: A Language and Environment for Statistical Computing},
## author = {{R Core Team}},
## organization = {R Foundation for Statistical Computing},
## address = {Vienna, Austria},
## year = {2024},
## url = {https://www.R-project.org/},
## }
##
## We have invested a lot of time and effort in creating R, please cite it
## when using it for data analysis. See also 'citation("pkgname")' for
## citing R packages.