Herein, we will utilise the Signac R package’s Harmony batch effect correction method (Korsunsky et al. 2019). The Harmony algorithm is accessible on GitHub, and Signac provides integration tutorials.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(Seurat)
## Attaching SeuratObject
library(harmony)
## Loading required package: Rcpp
library(Signac)
library(GenomicRanges)
## Loading required package: stats4
## Loading required package: BiocGenerics
## Loading required package: parallel
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, parApply, parCapply, parLapply,
##     parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:dplyr':
## 
##     combine, intersect, setdiff, union
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     anyDuplicated, append, as.data.frame, basename, cbind, colnames,
##     dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
##     grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
##     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
##     rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
##     union, unique, unsplit, which.max, which.min
## Loading required package: S4Vectors
## 
## Attaching package: 'S4Vectors'
## The following objects are masked from 'package:dplyr':
## 
##     first, rename
## The following object is masked from 'package:base':
## 
##     expand.grid
## Loading required package: IRanges
## 
## Attaching package: 'IRanges'
## The following objects are masked from 'package:dplyr':
## 
##     collapse, desc, slice
## Loading required package: GenomeInfoDb
pbmcv2 <- readRDS("/mnt/nectar_volume/home/eraz0001/new/pbmcv2.rds")
pbmcv1p1 <- readRDS("/mnt/nectar_volume/home/eraz0001/new/pbmcv1p1.rds")
head(pbmcv1p1)
##                       orig.ident nCount_peaks nFeature_peaks
## AAACGAAAGAACGCCA-1 SeuratProject        56436          20043
## AAACGAAAGACAGCTG-1 SeuratProject         7096           3436
## AAACGAAAGAGTAAGG-1 SeuratProject        10440           4848
## AAACGAAAGATGCGCA-1 SeuratProject        25283          10345
## AAACGAAAGGCAGATC-1 SeuratProject        10896           5057
## AAACGAAAGTCATACC-1 SeuratProject         6528           3192
## AAACGAACACGCGTTG-1 SeuratProject        17260           7419
## AAACGAACACGGCCAT-1 SeuratProject        11904           5426
## AAACGAACACTCAAGT-1 SeuratProject         9729           4644
## AAACGAACAGCGTCGT-1 SeuratProject        17572           7556

Add ‘DataSet’ info into the seurat obj.

pbmcv1p1$dataset <- 'pbmcv1p1'
pbmcv2$dataset <- 'pbmcv2'

Step 2: Merge objects Use the merge function to combine the objects without batch correction, and visualize them using the head and tail functions.

unintegrated <- merge(x = pbmcv1p1, y = pbmcv2, add.cell.ids = c("pbmcv1p1", "pbmcv2"))
## 
## Binding matrix rows
## 
## Binding matrix rows
## 
## Binding matrix rows
## 
## Binding matrix rows

Next, we visualize the data before batch correction.

unintegrated <- RunTFIDF(unintegrated)
## Performing TF-IDF normalization
unintegrated <- FindTopFeatures(unintegrated, min.cutoff = 20)
unintegrated <- RunSVD(unintegrated)
## Running SVD
## Scaling cell embeddings
unintegrated <- RunUMAP(unintegrated, dims = 2:50, reduction = 'lsi')
## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session
## 12:23:10 UMAP embedding parameters a = 0.9922 b = 1.112
## 12:23:10 Read 20882 rows and found 49 numeric columns
## 12:23:10 Using Annoy for neighbor search, n_neighbors = 30
## 12:23:10 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 12:23:13 Writing NN index file to temp file /tmp/RtmpUSJvea/file2da0fd9297f08
## 12:23:13 Searching Annoy index using 1 thread, search_k = 3000
## 12:23:23 Annoy recall = 100%
## 12:23:24 Commencing smooth kNN distance calibration using 1 thread
## 12:23:27 Initializing from normalized Laplacian + noise
## 12:23:28 Commencing optimization for 200 epochs, with 870980 positive edges
## 12:23:40 Optimization finished
DimPlot(unintegrated, group.by = 'dataset', pt.size = 0.1)

Given that these are both human PMBCs, one might expect better overlap between the two samples. We suspect that chemistry batch effects could cause the lack of overlap. Step 4: Batch correction using Harmony In this step, use Harmony to correct the batch effects between the two samples.

hm.integrated <- RunHarmony(object = unintegrated, group.by.vars = 'dataset', reduction = 'lsi', assay.use = 'peaks', project.dim = FALSE)
## Harmony 1/10
## Harmony 2/10
## Harmony 3/10
## Harmony 4/10
## Harmony 5/10
## Harmony 6/10
## Harmony 7/10
## Harmony 8/10
## Harmony 9/10
## Harmony 10/10
hm.integrated <- RunUMAP(hm.integrated, dims = 2:30, reduction = 'harmony')
## 12:24:57 UMAP embedding parameters a = 0.9922 b = 1.112
## 12:24:57 Read 20882 rows and found 29 numeric columns
## 12:24:57 Using Annoy for neighbor search, n_neighbors = 30
## 12:24:57 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 12:25:00 Writing NN index file to temp file /tmp/RtmpUSJvea/file2da0fd69dae4d2
## 12:25:00 Searching Annoy index using 1 thread, search_k = 3000
## 12:25:10 Annoy recall = 100%
## 12:25:11 Commencing smooth kNN distance calibration using 1 thread
## 12:25:13 Initializing from normalized Laplacian + noise
## 12:25:14 Commencing optimization for 200 epochs, with 886886 positive edges
## 12:25:26 Optimization finished
DimPlot(hm.integrated, group.by = 'dataset', pt.size = 0.1)

After batch correction in Harmony, note that the two PBMC samples largely overlap.