An object of class Seurat
47773 features across 43448 samples within 1 assay
Active assay: RNA (47773 features, 2000 variable features)
3 layers present: counts, data, scale.data
2 dimensional reductions calculated: pca, umap
print(SO_harmony)
An object of class Seurat
47773 features across 43448 samples within 1 assay
Active assay: RNA (47773 features, 2000 variable features)
3 layers present: counts, data, scale.data
4 dimensional reductions calculated: pca, umap, harmony, umap_harmony
Your PCA table showed 90% variance at PC26, so I recommend using 1:26 here instead of 1:35. This is already a more gentle choice.
RPCA Integration
# Keep old cluster identity before integrationSO_pre$pre_harmony_cluster <-as.character(SO_pre$seurat_clusters)SO_pre$was_pre_cluster5 <- SO_pre$pre_harmony_cluster =="5"DefaultAssay(SO_pre) <-"RNA"# Split by sampleSO_list <-SplitObject(SO_pre, split.by ="Sample")# Normalize and run PCA per sampleSO_list <-lapply(SO_list, function(x) { x <-NormalizeData(x, verbose =FALSE) x <-FindVariableFeatures(x, selection.method ="vst", nfeatures =2000, verbose =FALSE) x <-ScaleData(x, verbose =FALSE) x <-RunPCA(x, npcs =max(dims_use), verbose =FALSE)return(x)})# Select shared integration featuresrpca_features <-SelectIntegrationFeatures(object.list = SO_list,nfeatures =2000)# Prepare objects for RPCASO_list <-lapply(SO_list, function(x) { x <-ScaleData(x, features = rpca_features, verbose =FALSE) x <-RunPCA(x, features = rpca_features, npcs =max(dims_use), verbose =FALSE)return(x)})# Find RPCA anchorsrpca_anchors <-FindIntegrationAnchors(object.list = SO_list,anchor.features = rpca_features,reduction ="rpca",dims = dims_use)# Integrate dataSO_rpca <-IntegrateData(anchorset = rpca_anchors,dims = dims_use)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 43448
Number of edges: 1878475
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9159
Number of communities: 16
Elapsed time: 9 seconds
rpca_sample_cluster_df <-as.data.frame(rpca_sample_cluster_table)ggplot(rpca_sample_cluster_df, aes(x = Cluster, y = Freq, fill = Sample)) +geom_bar(stat ="identity", position ="fill", color ="black", linewidth =0.2) +theme_classic() +labs(title ="RPCA: sample composition within each cluster",x ="RPCA cluster",y ="Proportion" )
Track Pre-Harmony Cluster 5 After RPCA
cluster5_cells <-colnames(SO_pre)[SO_pre$seurat_clusters =="5"]cat("Number of pre-Harmony cluster 5 cells:\n")
Number of pre-Harmony cluster 5 cells:
print(length(cluster5_cells))
[1] 3352
# Make sure cells exist in all objectscluster5_cells_harmony <-intersect(cluster5_cells, colnames(SO_harmony))cluster5_cells_rpca <-intersect(cluster5_cells, colnames(SO_rpca))cat("\nWhere did pre-Harmony cluster 5 cells go after Harmony?\n")
Where did pre-Harmony cluster 5 cells go after Harmony?