Herein, we will utilise the Signac R package’s Harmony batch effect correction method (Korsunsky et al. 2019). The Harmony algorithm is accessible on GitHub, and Signac provides integration tutorials.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(Seurat)
## Attaching SeuratObject
library(harmony)
## Loading required package: Rcpp
library(Signac)
library(GenomicRanges)
## Loading required package: stats4
## Loading required package: BiocGenerics
## Loading required package: parallel
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
##
## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
## clusterExport, clusterMap, parApply, parCapply, parLapply,
## parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:dplyr':
##
## combine, intersect, setdiff, union
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, append, as.data.frame, basename, cbind, colnames,
## dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
## grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
## order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
## rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
## union, unique, unsplit, which.max, which.min
## Loading required package: S4Vectors
##
## Attaching package: 'S4Vectors'
## The following objects are masked from 'package:dplyr':
##
## first, rename
## The following object is masked from 'package:base':
##
## expand.grid
## Loading required package: IRanges
##
## Attaching package: 'IRanges'
## The following objects are masked from 'package:dplyr':
##
## collapse, desc, slice
## Loading required package: GenomeInfoDb
pbmcv2 <- readRDS("/mnt/nectar_volume/home/eraz0001/new/pbmcv2.rds")
pbmcv1p1 <- readRDS("/mnt/nectar_volume/home/eraz0001/new/pbmcv1p1.rds")
head(pbmcv1p1)
## orig.ident nCount_peaks nFeature_peaks
## AAACGAAAGAACGCCA-1 SeuratProject 56436 20043
## AAACGAAAGACAGCTG-1 SeuratProject 7096 3436
## AAACGAAAGAGTAAGG-1 SeuratProject 10440 4848
## AAACGAAAGATGCGCA-1 SeuratProject 25283 10345
## AAACGAAAGGCAGATC-1 SeuratProject 10896 5057
## AAACGAAAGTCATACC-1 SeuratProject 6528 3192
## AAACGAACACGCGTTG-1 SeuratProject 17260 7419
## AAACGAACACGGCCAT-1 SeuratProject 11904 5426
## AAACGAACACTCAAGT-1 SeuratProject 9729 4644
## AAACGAACAGCGTCGT-1 SeuratProject 17572 7556
Add ‘DataSet’ info into the seurat obj.
pbmcv1p1$dataset <- 'pbmcv1p1'
pbmcv2$dataset <- 'pbmcv2'
Step 2: Merge objects Use the merge function to combine the objects without batch correction, and visualize them using the head and tail functions.
unintegrated <- merge(x = pbmcv1p1, y = pbmcv2, add.cell.ids = c("pbmcv1p1", "pbmcv2"))
##
## Binding matrix rows
##
## Binding matrix rows
##
## Binding matrix rows
##
## Binding matrix rows
Next, we visualize the data before batch correction.
unintegrated <- RunTFIDF(unintegrated)
## Performing TF-IDF normalization
unintegrated <- FindTopFeatures(unintegrated, min.cutoff = 20)
unintegrated <- RunSVD(unintegrated)
## Running SVD
## Scaling cell embeddings
unintegrated <- RunUMAP(unintegrated, dims = 2:50, reduction = 'lsi')
## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session
## 12:23:10 UMAP embedding parameters a = 0.9922 b = 1.112
## 12:23:10 Read 20882 rows and found 49 numeric columns
## 12:23:10 Using Annoy for neighbor search, n_neighbors = 30
## 12:23:10 Building Annoy index with metric = cosine, n_trees = 50
## 0% 10 20 30 40 50 60 70 80 90 100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 12:23:13 Writing NN index file to temp file /tmp/RtmpUSJvea/file2da0fd9297f08
## 12:23:13 Searching Annoy index using 1 thread, search_k = 3000
## 12:23:23 Annoy recall = 100%
## 12:23:24 Commencing smooth kNN distance calibration using 1 thread
## 12:23:27 Initializing from normalized Laplacian + noise
## 12:23:28 Commencing optimization for 200 epochs, with 870980 positive edges
## 12:23:40 Optimization finished
DimPlot(unintegrated, group.by = 'dataset', pt.size = 0.1)
Given that these are both human PMBCs, one might expect better overlap between the two samples. We suspect that chemistry batch effects could cause the lack of overlap. Step 4: Batch correction using Harmony In this step, use Harmony to correct the batch effects between the two samples.
hm.integrated <- RunHarmony(object = unintegrated, group.by.vars = 'dataset', reduction = 'lsi', assay.use = 'peaks', project.dim = FALSE)
## Harmony 1/10
## Harmony 2/10
## Harmony 3/10
## Harmony 4/10
## Harmony 5/10
## Harmony 6/10
## Harmony 7/10
## Harmony 8/10
## Harmony 9/10
## Harmony 10/10
hm.integrated <- RunUMAP(hm.integrated, dims = 2:30, reduction = 'harmony')
## 12:24:57 UMAP embedding parameters a = 0.9922 b = 1.112
## 12:24:57 Read 20882 rows and found 29 numeric columns
## 12:24:57 Using Annoy for neighbor search, n_neighbors = 30
## 12:24:57 Building Annoy index with metric = cosine, n_trees = 50
## 0% 10 20 30 40 50 60 70 80 90 100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 12:25:00 Writing NN index file to temp file /tmp/RtmpUSJvea/file2da0fd69dae4d2
## 12:25:00 Searching Annoy index using 1 thread, search_k = 3000
## 12:25:10 Annoy recall = 100%
## 12:25:11 Commencing smooth kNN distance calibration using 1 thread
## 12:25:13 Initializing from normalized Laplacian + noise
## 12:25:14 Commencing optimization for 200 epochs, with 886886 positive edges
## 12:25:26 Optimization finished
DimPlot(hm.integrated, group.by = 'dataset', pt.size = 0.1)
After batch correction in Harmony, note that the two PBMC samples largely overlap.