UMAP of T cells without other PBMC cells using clusters and PC-1:50

I removed B cells and other cells and we will apply script for UMAP just on T cells

1. load libraries

## Loading required package: SeuratObject

## Loading required package: sp

## 
## Attaching package: 'SeuratObject'

## The following objects are masked from 'package:base':
## 
##     intersect, t

## ── Installed datasets ──────────────────────────────── SeuratData v0.2.2.9001 ──

## ✔ pbmcref 1.0.0                         ✔ pbmcsca 3.0.0

## ────────────────────────────────────── Key ─────────────────────────────────────

## ✔ Dataset loaded successfully
## ❯ Dataset built with a newer version of Seurat than installed
## ❓ Unknown version of Seurat installed

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.5
## ✔ ggplot2   3.5.1     ✔ stringr   1.5.1
## ✔ lubridate 1.9.3     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## 
## Attaching package: 'magrittr'
## 
## 
## The following object is masked from 'package:purrr':
## 
##     set_names
## 
## 
## The following object is masked from 'package:tidyr':
## 
##     extract
## 
## 
## 
## Attaching package: 'dbplyr'
## 
## 
## The following objects are masked from 'package:dplyr':
## 
##     ident, sql
## 
## 
## Registered S3 method overwritten by 'SeuratDisk':
##   method            from  
##   as.sparse.H5Group Seurat
## 
## 
## 
## Attaching shinyBS
## 
## Loading required package: SummarizedExperiment
## 
## Loading required package: MatrixGenerics
## 
## Loading required package: matrixStats
## 
## 
## Attaching package: 'matrixStats'
## 
## 
## The following object is masked from 'package:dplyr':
## 
##     count
## 
## 
## 
## Attaching package: 'MatrixGenerics'
## 
## 
## The following objects are masked from 'package:matrixStats':
## 
##     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
##     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
##     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
##     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
##     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
##     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
##     colWeightedMeans, colWeightedMedians, colWeightedSds,
##     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
##     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
##     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
##     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
##     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
##     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
##     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
##     rowWeightedSds, rowWeightedVars
## 
## 
## Loading required package: GenomicRanges
## 
## Loading required package: stats4
## 
## Loading required package: BiocGenerics
## 
## 
## Attaching package: 'BiocGenerics'
## 
## 
## The following objects are masked from 'package:lubridate':
## 
##     intersect, setdiff, union
## 
## 
## The following objects are masked from 'package:dplyr':
## 
##     combine, intersect, setdiff, union
## 
## 
## The following object is masked from 'package:SeuratObject':
## 
##     intersect
## 
## 
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## 
## 
## The following objects are masked from 'package:base':
## 
##     anyDuplicated, aperm, append, as.data.frame, basename, cbind,
##     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
##     get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
##     match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
##     Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
##     table, tapply, union, unique, unsplit, which.max, which.min
## 
## 
## Loading required package: S4Vectors
## 
## 
## Attaching package: 'S4Vectors'
## 
## 
## The following objects are masked from 'package:lubridate':
## 
##     second, second<-
## 
## 
## The following object is masked from 'package:tidyr':
## 
##     expand
## 
## 
## The following objects are masked from 'package:dplyr':
## 
##     first, rename
## 
## 
## The following object is masked from 'package:utils':
## 
##     findMatches
## 
## 
## The following objects are masked from 'package:base':
## 
##     expand.grid, I, unname
## 
## 
## Loading required package: IRanges
## 
## 
## Attaching package: 'IRanges'
## 
## 
## The following object is masked from 'package:lubridate':
## 
##     %within%
## 
## 
## The following object is masked from 'package:purrr':
## 
##     reduce
## 
## 
## The following objects are masked from 'package:dplyr':
## 
##     collapse, desc, slice
## 
## 
## The following object is masked from 'package:sp':
## 
##     %over%
## 
## 
## Loading required package: GenomeInfoDb
## 
## 
## Attaching package: 'GenomicRanges'
## 
## 
## The following object is masked from 'package:magrittr':
## 
##     subtract
## 
## 
## Loading required package: Biobase
## 
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## 
## 
## 
## Attaching package: 'Biobase'
## 
## 
## The following object is masked from 'package:MatrixGenerics':
## 
##     rowMedians
## 
## 
## The following objects are masked from 'package:matrixStats':
## 
##     anyMissing, rowMedians
## 
## 
## 
## Attaching package: 'SummarizedExperiment'
## 
## 
## The following object is masked from 'package:Seurat':
## 
##     Assays
## 
## 
## The following object is masked from 'package:SeuratObject':
## 
##     Assays
## 
## 
## 
## Attaching package: 'celldex'
## 
## 
## The following objects are masked from 'package:SingleR':
## 
##     BlueprintEncodeData, DatabaseImmuneCellExpressionData,
##     HumanPrimaryCellAtlasData, ImmGenData, MonacoImmuneData,
##     MouseRNAseqData, NovershternHematopoieticData
## 
## 
## Loading required package: ggraph
## 
## 
## Attaching package: 'ggraph'
## 
## 
## The following object is masked from 'package:sp':
## 
##     geometry

2. Load Seurat Object

#Load Seurat Object merged from cell lines and a control(PBMC) after filtration
load("All_T_cells_Merged_filtered_Mono_using_clusters.Robj")


 All_samples_Merged <- filtered_data
 All_samples_Merged

## An object of class Seurat 
## 62626 features across 46976 samples within 6 assays 
## Active assay: SCT (25902 features, 3000 variable features)
##  3 layers present: counts, data, scale.data
##  5 other assays present: RNA, ADT, prediction.score.celltype.l1, prediction.score.celltype.l2, prediction.score.celltype.l3
##  4 dimensional reductions calculated: pca, umap, integrated_dr, ref.umap

 rm(filtered_data)

3. Azimuth Annotation

# InstallData("pbmcref")
# 
# # The RunAzimuth function can take a Seurat object as input
# All_samples_Merged <- RunAzimuth(All_samples_Merged, reference = "pbmcref")

4. QC

# Set identity classes to an existing column in meta data
Idents(object = All_samples_Merged) <- "cell_line"

All_samples_Merged[["percent.rb"]] <- PercentageFeatureSet(All_samples_Merged, 
                                                           pattern = "^RP[SL]")

VlnPlot(All_samples_Merged, features = c("nFeature_RNA", 
                                         "nCount_RNA", 
                                         "percent.mt",
                                         "percent.rb"), 
                            ncol = 4, pt.size = 0.1) & 
              theme(plot.title = element_text(size=10))

FeatureScatter(All_samples_Merged, feature1 = "percent.mito", 
                                  feature2 = "percent.rb")

VlnPlot(All_samples_Merged, features = c("nFeature_RNA", 
                                    "nCount_RNA", 
                                    "percent.mito"), 
                                      ncol = 3)

FeatureScatter(All_samples_Merged, 
               feature1 = "percent.mito", 
               feature2 = "percent.rb") +
        geom_smooth(method = 'lm')

## `geom_smooth()` using formula = 'y ~ x'

FeatureScatter(All_samples_Merged, 
               feature1 = "nCount_RNA", 
               feature2 = "nFeature_RNA") +
        geom_smooth(method = 'lm')

## `geom_smooth()` using formula = 'y ~ x'

##FeatureScatter is typically used to visualize feature-feature relationships ##for anything calculated by the object, ##i.e. columns in object metadata, PC scores etc.

FeatureScatter(All_samples_Merged, 
               feature1 = "nCount_RNA", 
               feature2 = "percent.mito")+
  geom_smooth(method = 'lm')

## `geom_smooth()` using formula = 'y ~ x'

FeatureScatter(All_samples_Merged, 
               feature1 = "nCount_RNA", 
               feature2 = "nFeature_RNA")+
  geom_smooth(method = 'lm')

## `geom_smooth()` using formula = 'y ~ x'

Assign Cell-Cycle Scores

## Running SCTransform on assay: RNA

## vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.

## Calculating cell attributes from input UMI matrix: log_umi

## Variance stabilizing transformation of count matrix of size 25901 by 46976

## Model formula is y ~ log_umi

## Get Negative Binomial regression parameters per gene

## Using 2000 genes, 5000 cells

## Found 484 outliers - those will be ignored in fitting/regularization step

## Second step: Get residuals using fitted parameters for 25901 genes

## Computing corrected count matrix for 25901 genes

## Calculating gene attributes

## Wall clock passed: Time difference of 2.299541 mins

## Determine variable features

## Place corrected count matrix in counts slot

## Warning: Different cells and/or features from existing assay SCT

## Set default assay to SCT

## Warning: The following features are not present in the object: MLF1IP, not
## searching for symbol synonyms

## Warning: The following features are not present in the object: FAM64A, HN1, not
## searching for symbol synonyms

5. Normalize data

# Apply SCTransform
All_samples_Merged <- SCTransform(All_samples_Merged, 
                                  vars.to.regress = c("percent.rb","percent.mito", "CC.Difference"), 
                                  do.scale=TRUE, 
                                  do.center=TRUE, 
                                  verbose = TRUE)

## Running SCTransform on assay: RNA

## vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes.

## Calculating cell attributes from input UMI matrix: log_umi

## Variance stabilizing transformation of count matrix of size 25901 by 46976

## Model formula is y ~ log_umi

## Get Negative Binomial regression parameters per gene

## Using 2000 genes, 5000 cells

## Found 484 outliers - those will be ignored in fitting/regularization step

## Second step: Get residuals using fitted parameters for 25901 genes

## Computing corrected count matrix for 25901 genes

## Calculating gene attributes

## Wall clock passed: Time difference of 2.29561 mins

## Determine variable features

## Regressing out percent.rb, percent.mito, CC.Difference

## Centering and scaling data matrix

## Place corrected count matrix in counts slot

## Set default assay to SCT

6. Perform PCA

Variables_genes <- All_samples_Merged@assays$SCT@var.features

# Exclude genes starting with "HLA-" AND "Xist" AND "TRBV, TRAV"
Variables_genes_after_exclusion <- Variables_genes[!grepl("^HLA-|^XIST|^TRBV|^TRAV", Variables_genes)]


# These are now standard steps in the Seurat workflow for visualization and clustering
All_samples_Merged <- RunPCA(All_samples_Merged,
                        features = Variables_genes_after_exclusion,
                        do.print = TRUE, 
                        pcs.print = 1:5, 
                        genes.print = 15,
                        npcs = 90)

## PC_ 1 
## Positive:  CD7, KIR3DL1, SEPTIN9, KIR2DL3, PRKCH, CLEC2B, KIR3DL2, SH3BGRL3, EPCAM, CST7 
##     GZMM, CXCR3, ESYT2, MATK, CD52, MYO1E, XCL1, KLRC1, TRGV2, CD3G 
##     OCIAD2, KIR2DL4, PTPRC, KLRK1, LCK, SYNE2, IFITM1, RPS27, CD6, PTPN6 
## Negative:  NPM1, SEC11C, IL2RA, MTHFD2, HDGFL3, YBX3, VDAC1, MTDH, RBM17, C12orf75 
##     CCT8, KRT7, HINT2, MINDY3, CCND2, CTSH, RAD21, EGFL6, BATF3, RDH10 
##     SPATS2L, MIIP, PRELID1, CANX, HTATIP2, TNFRSF4, MIR155HG, SRM, SLC35F3, RAN 
## PC_ 2 
## Positive:  NME2, RPL35, CHCHD2, RPL27A, RPS17, NME1, COX6A1, C1QBP, ATP5MC3, KIR3DL1 
##     DAD1, PFN1, HSPE1, CORO1B, ATP5MC1, SPINT2, ATP5F1B, SET, ENO1, HSP90AA1 
##     APRT, MATK, EPCAM, PPP1R14B, MRPL12, RPS18, RAB25, TRGV2, NDUFA4, GYPC 
## Negative:  B2M, RPL34, ANKRD12, CD2, MALAT1, SARAF, RPS4Y1, RPL39, LMNA, PNRC1 
##     PAGE5, MVP, TENM3, FYB1, PTPRC, ANXA1, ITGB1, RPL35A, BCL3, ERAP2 
##     RAP1A, PLD1, CLEC2D, DAZAP2, AHNAK, TSC22D3, CD74, GBP2, BTG1, IL7R 
## PC_ 3 
## Positive:  TIGIT, TNFRSF4, HACD1, EGFL6, BACE2, RPL30, C12orf75, RPS27, PXYLP1, ARPC2 
##     ACTN1, ETS1, NET1, BTG1, RASGRP2, SYT4, C12orf57, GRIA4, SMIM3, LAPTM5 
##     CD3E, MAP1B, MCTP2, MT-CYB, UBE2D2, PHLDA2, FGGY, RPS4Y1, KRT7, DOCK8 
## Negative:  GAPDH, PAGE5, RPL22L1, NDUFV2, PSMB2, RPS15, STMN1, TUBB, RPL35A, CDKN2A 
##     RPS2, RPS23, CD74, RBPMS, EEF2, TYMS, PPBP, WDR34, GPX4, RPS14 
##     RPS3A, CENPX, RPLP0, TUBA1B, CCNA2, ACTG1, FTL, ACAT1, RPL19, NEURL1 
## PC_ 4 
## Positive:  PPID, EIF5A, FCER2, HSPE1, PPBP, ODC1, DNAJC12, CHCHD10, RPS4Y1, FAM162A 
##     ATP5MC3, MTDH, MT-ND3, GSTP1, MT-ND1, YBX3, CYC1, HSP90B1, PRELID3B, TCF7 
##     FKBP11, COL19A1, GCSH, RPL34, CYCS, FKBP4, SLC7A11-AS1, HSPD1, PSMB6, MIR155HG 
## Negative:  RPS4X, S100A4, WFDC1, EGLN3, S100A11, IL32, GAS5, S100A6, KRT1, LINC02752 
##     TTC29, VIM, TBX4, RPLP1, PLCB1, TNS4, AC069410.1, SP5, GATA3, NKG7 
##     RPL13, IFNGR1, MYO1G, RCSD1, SEMA4A, FAM9C, PTGIS, LAT, IL4, RPS24 
## PC_ 5 
## Positive:  PGAM1, MIIP, RDH10, PFKP, TP73, EEF1A2, HOXC9, GPAT2, GPAT3, TMEM163 
##     CEP135, FABP5, RIOX2, IL2RA, TPI1, TMSB10, PIM2, RBM38, LDHA, LIME1 
##     LGALS1, TAGLN2, DDIT4, PGK1, CA8, S100A11, RPL41, PXYLP1, IRX3, S100A10 
## Negative:  CCL17, MAP4K4, MYO1D, IGHE, MIR155HG, CFI, CA10, THY1, IL4I1, EZH2 
##     LRBA, CA2, BCAT1, RXFP1, HS3ST1, FRMD4A, AL590550.1, SETBP1, RANBP17, SYT4 
##     RYR2, MAGI1, RUNX1, ONECUT2, AC100801.1, LAYN, TNFRSF9, AKAP12, AL023574.1, MGST3

## Warning: Number of dimensions changing from 50 to 90

# determine dimensionality of the data
ElbowPlot(All_samples_Merged, ndims =90)

7. Perform PCA TEST

library(ggplot2)
library(RColorBrewer)  

# Assuming you have 10 different cell lines, generating a color palette with 10 colors
cell_line_colors <- brewer.pal(10, "Set3")

# Assuming All_samples_Merged$cell_line is a factor or character vector containing cell line names
data <- as.data.frame(table(All_samples_Merged$cell_line))
colnames(data) <- c("cell_line", "nUMI")  # Change column name to nUMI

ncells <- ggplot(data, aes(x = cell_line, y = nUMI, fill = cell_line)) + 
  geom_col() +
  theme_classic() +
  geom_text(aes(label = nUMI), 
            position = position_dodge(width = 0.9), 
            vjust = -0.25) +
  scale_fill_manual(values = cell_line_colors) + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        plot.title = element_text(hjust = 0.5)) +  # Adjust the title position
  ggtitle("Filtered cells per sample") +
  xlab("Cell lines") +  # Adjust x-axis label
  ylab("Frequency")    # Adjust y-axis label

print(ncells)

# TEST-1
# given that the output of RunPCA is "pca"
# replace "so" by the name of your seurat object

pct <- All_samples_Merged[["pca"]]@stdev / sum(All_samples_Merged[["pca"]]@stdev) * 100
cumu <- cumsum(pct) # Calculate cumulative percents for each PC
# Determine the difference between variation of PC and subsequent PC
co2 <- sort(which((pct[-length(pct)] - pct[-1]) > 0.1), decreasing = T)[1] + 1
# last point where change of % of variation is more than 0.1%. -> co2
co2

## [1] 12

# TEST-2
# get significant PCs
stdv <- All_samples_Merged[["pca"]]@stdev
sum.stdv <- sum(All_samples_Merged[["pca"]]@stdev)
percent.stdv <- (stdv / sum.stdv) * 100
cumulative <- cumsum(percent.stdv)
co1 <- which(cumulative > 90 & percent.stdv < 5)[1]
co2 <- sort(which((percent.stdv[1:length(percent.stdv) - 1] - 
                       percent.stdv[2:length(percent.stdv)]) > 0.1), 
              decreasing = T)[1] + 1
min.pc <- min(co1, co2)
min.pc

## [1] 12

# Create a dataframe with values
plot_df <- data.frame(pct = percent.stdv, 
           cumu = cumulative, 
           rank = 1:length(percent.stdv))

# Elbow plot to visualize 
  ggplot(plot_df, aes(cumulative, percent.stdv, label = rank, color = rank > min.pc)) + 
  geom_text() + 
  geom_vline(xintercept = 90, color = "grey") + 
  geom_hline(yintercept = min(percent.stdv[percent.stdv > 5]), color = "grey") +
  theme_bw()

8. Clustering

All_samples_Merged <- FindNeighbors(All_samples_Merged, 
                                dims = 1:50, 
                                verbose = FALSE)

# understanding resolution
All_samples_Merged <- FindClusters(All_samples_Merged, 
                                    resolution = c(0.4,0.5, 0.6, 0.7,0.8, 0.9, 1))

## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 46976
## Number of edges: 1696701
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9565
## Number of communities: 16
## Elapsed time: 8 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 46976
## Number of edges: 1696701
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9489
## Number of communities: 18
## Elapsed time: 8 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 46976
## Number of edges: 1696701
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9414
## Number of communities: 22
## Elapsed time: 8 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 46976
## Number of edges: 1696701
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9345
## Number of communities: 25
## Elapsed time: 7 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 46976
## Number of edges: 1696701
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9275
## Number of communities: 25
## Elapsed time: 7 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 46976
## Number of edges: 1696701
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9209
## Number of communities: 28
## Elapsed time: 7 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 46976
## Number of edges: 1696701
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9153
## Number of communities: 32
## Elapsed time: 7 seconds

# non-linear dimensionality reduction --------------
All_samples_Merged <- RunUMAP(All_samples_Merged, 
                          dims = 1:50,
                          verbose = FALSE)

## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session

# note that you can set `label = TRUE` or use the Label Clusters function to help label
# individual clusters
DimPlot(All_samples_Merged,group.by = "cell_line", 
        reduction = "umap",
        label.size = 3,
        repel = T,
        label = T, label.box = T)

DimPlot(All_samples_Merged,group.by = "predicted.celltype.l2", 
        reduction = "umap",
        label.size = 3,
        repel = T,
        label = T, label.box = T)

## Warning: ggrepel: 7 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

DimPlot(All_samples_Merged,
        group.by = "SCT_snn_res.0.4", 
        reduction = "umap",
        label.size = 3,
        repel = T,
        label = T, label.box = T)

DimPlot(All_samples_Merged,
        group.by = "SCT_snn_res.0.5", 
        reduction = "umap",
        label.size = 3,
        repel = T,
        label = T, label.box = T)

DimPlot(All_samples_Merged,
        group.by = "SCT_snn_res.0.6", 
        reduction = "umap",
        label.size = 3,
        repel = T,
        label = T, label.box = T)

DimPlot(All_samples_Merged,
        group.by = "SCT_snn_res.0.7", 
        reduction = "umap",
        label.size = 3,
        repel = T,
        label = T, label.box = T)

DimPlot(All_samples_Merged,
        group.by = "SCT_snn_res.0.8", 
        reduction = "umap",
        label.size = 3,
        repel = T,
        label = T, label.box = T)

DimPlot(All_samples_Merged,
        group.by = "SCT_snn_res.0.9", 
        reduction = "umap",
        label.size = 3,
        repel = T,
        label = T, label.box = T)

DimPlot(All_samples_Merged,
        group.by = "SCT_snn_res.1", 
        reduction = "umap",
        label.size = 3,
        repel = T,
        label = T, label.box = T)

DimPlot(All_samples_Merged,
        group.by = "SCT_snn_res.1.1", 
        reduction = "umap",
        label.size = 3,
        repel = T,
        label = T, label.box = T)

DimPlot(All_samples_Merged,
        group.by = "SCT_snn_res.1.2", 
        reduction = "umap",
        label.size = 3,
        repel = T,
        label = T, label.box = T)

# Set identity classes to an existing column in meta data
Idents(object = All_samples_Merged) <- "SCT_snn_res.0.4"

cluster_table <- table(Idents(All_samples_Merged))


barplot(cluster_table, main = "Number of Cells in Each Cluster", 
                      xlab = "Cluster", 
                      ylab = "Number of Cells", 
                      col = rainbow(length(cluster_table)))

print(cluster_table)

## 
##    0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15 
## 6428 5934 4888 4500 4042 3329 2819 2787 2704 2494 1966 1906 1436  868  530  345

table(All_samples_Merged$predicted.celltype.l2, All_samples_Merged$SCT_snn_res.0.1)

##                    
##                        0    1    2    3    4    5    6    7    8
##   CD4 CTL              0    6    0    0    0    0    0    0    0
##   CD4 Naive            0  705    0    0    1    0    0    0    0
##   CD4 Proliferating 5203    2 5347 3010 2425 3955 4011 3127 1299
##   CD4 TCM           1007 4359  571  287 3237  615  543  136   33
##   CD4 TEM              0   46    0    0   23    0    0    0    0
##   CD8 Naive            6  380    2    0    0   19    2    3    1
##   CD8 TCM              0  255    0   10  148    0    0    0    0
##   CD8 TEM              0  209    0    8    0    0    0    0    0
##   cDC2                11    1    0    0    0   72  179   65    8
##   dnT                  1   50    0    1    5    0    4    0    0
##   gdT                  0   13    0    0    0    0    0    0    0
##   HSPC               173    2    9    1   13  666  304  789  400
##   ILC                  0    1    0    0    0    0    0    0    0
##   MAIT                 0   56    0    0    0    0    0    0    0
##   NK                   0   92    0    0    0    0    0    0    0
##   NK Proliferating     7    0   11 2615   20   14  193    4    0
##   Treg                11  170    0    0    8    0   15    1    0

9. clusTree

clustree(All_samples_Merged, prefix = "SCT_snn_res.")

10. Azimuth Visualization

DimPlot(All_samples_Merged, group.by = "predicted.celltype.l1", 
        reduction = "umap",
        label.size = 3,
        repel = T,
        label = T, label.box = T)

DimPlot(All_samples_Merged, group.by = "predicted.celltype.l1", 
        reduction = "umap",
        label.size = 3,
        repel = T,
        label = F)

DimPlot(All_samples_Merged, group.by = "predicted.celltype.l2", 
        reduction = "umap",
        label.size = 3,
        repel = T,
        label = T, label.box = T)

## Warning: ggrepel: 7 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

DimPlot(All_samples_Merged, group.by = "predicted.celltype.l2", 
        reduction = "umap",
        label.size = 3,
        repel = T,
        label = F)

DimPlot(All_samples_Merged, group.by = "predicted.celltype.l2", 
        reduction = "umap",
        label.size = 3,
        repel = T,
        label = T, label.box = T)

## Warning: ggrepel: 7 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

table(All_samples_Merged$predicted.celltype.l2, All_samples_Merged$SCT_snn_res.0.4)

##                    
##                        0    1    2    3    4    5    6    7    8    9   10   11
##   CD4 CTL              0    0    0    0    0    0    6    0    0    0    0    0
##   CD4 Naive            0    0    0    0    0    1    4    0  661    0    0    0
##   CD4 Proliferating 5208 3011 3806 3936 3056 1073    0 2181    0 1702  922 1431
##   CD4 TCM           1010  287  477  553  140 2130 2039   19 1651  605  979   45
##   CD4 TEM              0    0    0    0    0   22   44    0    1    0    0    0
##   CD8 Naive            6    0    2    2    3    0   13    0  331   19    0    1
##   CD8 TCM              0   10    0    0    0   92  230    0   19    0   53    0
##   CD8 TEM              0    8    0    0    0    0  209    0    0    0    0    0
##   cDC2                11    0  148    0   67    0    0    3    0   73    0   11
##   dnT                  1    2    1    0    0    1   11    0    4    0    2    0
##   gdT                  0    0    0    0    0    0   13    0    0    0    0    0
##   HSPC               173    1  288    1  771    0    0  580    1   86    2  418
##   ILC                  0    0    0    0    0    0    1    0    0    0    0    0
##   MAIT                 0    0    0    0    0    0   55    0    0    0    0    0
##   NK                   0    0    0    0    0    0   91    0    0    0    0    0
##   NK Proliferating     8 2615  164    8    4   10    0    4    0    9    8    0
##   Treg                11    0    2    0    1    0  103    0   36    0    0    0
##                    
##                       12   13   14   15
##   CD4 CTL              0    0    0    0
##   CD4 Naive            0   40    0    0
##   CD4 Proliferating 1408    5  424  216
##   CD4 TCM             18  693  100   42
##   CD4 TEM              0    1    1    0
##   CD8 Naive            0   36    0    0
##   CD8 TCM              0    6    3    0
##   CD8 TEM              0    0    0    0
##   cDC2                 0    0    0   23
##   dnT                  0   36    0    3
##   gdT                  0    0    0    0
##   HSPC                 8   10    0   18
##   ILC                  0    0    0    0
##   MAIT                 0    1    0    0
##   NK                   0    1    0    0
##   NK Proliferating     2    0    2   30
##   Treg                 0   39    0   13

11.Save the Seurat object as an Robj file

#save(All_samples_Merged, file = "All_Normal-PBMC_Abnormal-cellLines_T_cells_Merged_Annotated_UMAP_on_Clusters_to_USE_PC50.Robj")