##. In this script I will remove Non T cells from PBMC

1. load libraries

2. Load Seurat Object


#Load Seurat Object merged from cell lines and a control(PBMC) after filtration
load("../0-R_Objects/All_Samples_Merged_with_10x_Azitmuth_Annotated.robj")


All_samples_Merged
An object of class Seurat 
36752 features across 59355 samples within 5 assays 
Active assay: RNA (36601 features, 0 variable features)
 2 layers present: data, counts
 4 other assays present: ADT, prediction.score.celltype.l1, prediction.score.celltype.l2, prediction.score.celltype.l3
 2 dimensional reductions calculated: integrated_dr, ref.umap

Cell type Distribution to check clusters

# We can apply it later on R obj to get these tables to compare it to decide about resolution.

# Azimuth l1
janitor::tabyl(All_samples_Merged@meta.data, predicted.celltype.l1, cell_line)
 predicted.celltype.l1   L1   L2   L3   L4   L5   L6   L7 PBMC PBMC_10x
                     B    0    0   12  131   35   85  121 1093      957
                 CD4 T 5771 3124 6351 5967 4914 4575 4634 5448     3639
                 CD8 T   13   25    0    1    4    5    3  814     1379
                    DC    0    0    1    4   29   12   41   71      196
                  Mono    0    0    1   13    3    0    5  754     3165
                    NK   38 2784    6   25   11  259   38   92      463
                 other    0    0   57    8 1025  208  487   19       46
               other T    3    2    0    1    1    4    2   63      317
# Azimuth l2
janitor::tabyl(All_samples_Merged@meta.data, predicted.celltype.l2, cell_line)
 predicted.celltype.l2   L1   L2   L3   L4   L5   L6   L7 PBMC PBMC_10x
                  ASDC    0    0    0    0    0    0    0    0        3
        B intermediate    0    0    2   54    2    2    0  457      179
              B memory    0    0   11   34   38   82  120  164       74
               B naive    0    0    0   41    0    0    0  459      692
             CD14 Mono    0    0    1   14    5    0    6  755     3042
             CD16 Mono    0    0    0    0    0    0    0    2      124
               CD4 CTL    0    0    0    0    0    0    0   16        1
             CD4 Naive    0    0    0    7    0    0    0  524     1512
     CD4 Proliferating 2461 2852 5452 5391 4732 4002 4115    0        6
               CD4 TCM 3320  270  887  562  178  557  517 4609     1978
               CD4 TEM    1    0    0    0    0    0    0   68       25
             CD8 Naive    0    0    0    0    0    0    0  361     1012
     CD8 Proliferating    0    0    0    0    0    1    1    0        0
               CD8 TCM    1   16    0    0    0    0    0  286      174
               CD8 TEM    1    8    0    0    2    3    1  181      195
                  cDC1    0    0    0    0    2    6    0   21       13
                  cDC2    0    0    0    4   11    3   35   52      124
                   dnT    2    3    0    1    2    5    2   38       29
                   gdT    0    0    0    0    0    0    0   26       67
                  HSPC    0    0   60    7 1035  213  490   17       12
                   ILC    0    0    0    1    0    0    0    3        3
                  MAIT    0    0    0    0    0    0    0   14      228
                    NK    0    0    0    1    0    0    0   89      444
      NK Proliferating   38 2785    6   24   11  259   38    1        5
         NK_CD56bright    0    0    0    0    0    0    0    1       15
                   pDC    0    0    0    0    0    0    0    0       56
           Plasmablast    0    0    0    0    0    0    0    9       10
              Platelet    0    0    0    0    0    0    0    1       31
                  Treg    1    1    9    9    4   15    6  200      108
# Azimuth l3
janitor::tabyl(All_samples_Merged@meta.data, predicted.celltype.l3, cell_line)
 predicted.celltype.l3   L1   L2   L3   L4   L5   L6   L7 PBMC PBMC_10x
              ASDC_pDC    0    0    0    0    0    0    0    0        4
  B intermediate kappa    0    0    0    1    0    0    0   75       51
 B intermediate lambda    0    0    2   48    4    2    3  371      129
        B memory kappa    0    0    7    1    7   22   55   38       42
       B memory lambda    0    0    3   41    7   29   26  131       30
         B naive kappa    0    0    0    0    0    0    0  267      645
        B naive lambda    0    0    0   37    0    0    0  192       48
             CD14 Mono    0    0    1   15    9    0   22  755     3043
             CD16 Mono    0    0    0    0    0    0    0    2      124
               CD4 CTL    0    0    0    0    0    0    0   27        1
             CD4 Naive    0    0    0    9    0    0    0  528     1525
     CD4 Proliferating 2462 2852 5452 5391 4732 4003 4115    0        6
             CD4 TCM_1 1932    6    6   35    0    7    3 4057     1449
             CD4 TCM_2  652  250  870  516  165  536  482  265      144
             CD4 TCM_3  436   12    4    6   13    7   26  267      359
             CD4 TEM_1    1    0    0    0    0    0    0    4        8
             CD4 TEM_2    0    0    0    0    0    0    0   17       17
             CD4 TEM_3    0    0    0    0    0    0    0   50        1
             CD8 Naive    0    0    0    0    0    0    0  291      969
           CD8 Naive_2    0    0    0    0    1    1    0   63       47
     CD8 Proliferating    0    0    0    0    0    1    1    0        0
             CD8 TCM_1    0    8    0    0    0    0    0  137      115
             CD8 TCM_2  298    6    0    0    0    0    0   80       43
             CD8 TCM_3    0    1    0    0    0    0    0   77       17
             CD8 TEM_1    0    0    0    0    0    0    0   91       67
             CD8 TEM_2    0    0    0    0    0    1    0   32       62
             CD8 TEM_3    0    0    0    0    0    0    0   17       15
             CD8 TEM_4    0    0    0    0    0    0    0    2       12
             CD8 TEM_5    0    0    0    0    0    0    0   27        1
             CD8 TEM_6    1   10    0    0    1    1    1    1       36
                  cDC1    0    0    0    1   17   36   16   28       13
                cDC2_1    0    0    0    1    1    0    4    2       40
                cDC2_2    0    0    1    3   12    4   33   46       83
                 dnT_2    2    3    0    2    2    6    2   42       31
                 gdT_1    0    0    0    0    0    0    0   33       52
                 gdT_3    0    0    0    0    0    0    0    1       15
                  HSPC    0    0   62    7 1036  214  493   17       12
                   ILC    0    0    1    3    0    0    0    4        3
                  MAIT    0    0    0    0    0    0    0   16      229
      NK Proliferating   38 2785    6   24   11  259   38    1        5
                  NK_1    0    0    0    0    0    0    0   42        7
                  NK_2    0    0    0    0    0    0    0   18      386
                  NK_3    0    0    0    0    0    0    0   22       20
                  NK_4    0    0    0    0    0    0    0    3       30
         NK_CD56bright    0    0    0    0    0    0    0    1       16
                   pDC    0    0    0    0    0    0    0    0       56
                Plasma    0    0    0    0    0    0    0    9       10
              Platelet    0    0    0    0    0    0    0    1       31
           Treg Memory    3    2   13    9    4   19   11  200       89
            Treg Naive    0    0    0    0    0    0    0    4       24

Cell type Distribution barplot


library(ggplot2)
library(RColorBrewer)  

# Assuming you have 10 different cell lines, generating a color palette with 10 colors
cell_line_colors <- brewer.pal(10, "Set3")

# Assuming All_samples_Merged$cell_line is a factor or character vector containing cell line names
data <- as.data.frame(table(All_samples_Merged$cell_line))
colnames(data) <- c("cell_line", "nUMI")  # Change column name to nUMI

ncells <- ggplot(data, aes(x = cell_line, y = nUMI, fill = cell_line)) + 
  geom_col() +
  theme_classic() +
  geom_text(aes(label = nUMI), 
            position = position_dodge(width = 0.9), 
            vjust = -0.25) +
  scale_fill_manual(values = cell_line_colors) + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        plot.title = element_text(hjust = 0.5)) +  # Adjust the title position
  ggtitle("Filtered cells per sample") +
  xlab("Cell lines") +  # Adjust x-axis label
  ylab("Frequency")    # Adjust y-axis label

print(ncells)

NA
NA

3. filter cells just keep CD4 T cells


# Set identity to cell_line 
Idents(All_samples_Merged) <- "cell_line"

# Identify CD4 T cells in PBMC and PBMC_10x
cd4_cells_pbmc <- WhichCells(All_samples_Merged, expression = 
  grepl("^CD4", All_samples_Merged$predicted.celltype.l1) &
  grepl("^CD4", All_samples_Merged$predicted.celltype.l2) &
  grepl("^CD4", All_samples_Merged$predicted.celltype.l3) &
  (cell_line %in% c("PBMC", "PBMC_10x"))
)

# Select all cells from other cell lines
other_cells_seurat <- subset(All_samples_Merged, subset = cell_line %in% c("L1", "L2", "L3", "L4", "L5", "L6", "L7"))

# Get the cell names from the subset
other_cells <- colnames(other_cells_seurat)

# Combine the cell lists
cells_to_keep <- c(cd4_cells_pbmc, other_cells)

# Create the final filtered Seurat object
filtered_seurat <- subset(All_samples_Merged, cells = cells_to_keep)



library(ggplot2)
library(RColorBrewer)  

# Assuming you have 10 different cell lines, generating a color palette with 10 colors
cell_line_colors <- brewer.pal(10, "Set3")

# Assuming All_samples_Merged$cell_line is a factor or character vector containing cell line names
data <- as.data.frame(table(filtered_seurat$cell_line))
colnames(data) <- c("cell_line", "nUMI")  # Change column name to nUMI

ncells <- ggplot(data, aes(x = cell_line, y = nUMI, fill = cell_line)) + 
  geom_col() +
  theme_classic() +
  geom_text(aes(label = nUMI), 
            position = position_dodge(width = 0.9), 
            vjust = -0.25) +
  scale_fill_manual(values = cell_line_colors) + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        plot.title = element_text(hjust = 0.5)) +  # Adjust the title position
  ggtitle("Filtered cells per sample") +
  xlab("Cell lines") +  # Adjust x-axis label
  ylab("Frequency")    # Adjust y-axis label

print(ncells)

NA
NA

Cell type Distribution to check clusters


# We can apply it later on R obj to get these tables to compare it to decide about resolution.


janitor::tabyl(filtered_seurat@meta.data, predicted.celltype.l1, cell_line)
 predicted.celltype.l1   L1   L2   L3   L4   L5   L6   L7 PBMC PBMC_10x
                     B    0    0   12  131   35   85  121    0        0
                 CD4 T 5771 3124 6351 5967 4914 4575 4634 5171     3505
                 CD8 T   13   25    0    1    4    5    3    0        0
                    DC    0    0    1    4   29   12   41    0        0
                  Mono    0    0    1   13    3    0    5    0        0
                    NK   38 2784    6   25   11  259   38    0        0
                 other    0    0   57    8 1025  208  487    0        0
               other T    3    2    0    1    1    4    2    0        0
janitor::tabyl(filtered_seurat@meta.data, predicted.celltype.l2, cell_line)
 predicted.celltype.l2   L1   L2   L3   L4   L5   L6   L7 PBMC PBMC_10x
        B intermediate    0    0    2   54    2    2    0    0        0
              B memory    0    0   11   34   38   82  120    0        0
               B naive    0    0    0   41    0    0    0    0        0
             CD14 Mono    0    0    1   14    5    0    6    0        0
               CD4 CTL    0    0    0    0    0    0    0   12        1
             CD4 Naive    0    0    0    7    0    0    0  523     1512
     CD4 Proliferating 2461 2852 5452 5391 4732 4002 4115    0        6
               CD4 TCM 3320  270  887  562  178  557  517 4576     1963
               CD4 TEM    1    0    0    0    0    0    0   60       23
     CD8 Proliferating    0    0    0    0    0    1    1    0        0
               CD8 TCM    1   16    0    0    0    0    0    0        0
               CD8 TEM    1    8    0    0    2    3    1    0        0
                  cDC1    0    0    0    0    2    6    0    0        0
                  cDC2    0    0    0    4   11    3   35    0        0
                   dnT    2    3    0    1    2    5    2    0        0
                  HSPC    0    0   60    7 1035  213  490    0        0
                   ILC    0    0    0    1    0    0    0    0        0
                    NK    0    0    0    1    0    0    0    0        0
      NK Proliferating   38 2785    6   24   11  259   38    0        0
                  Treg    1    1    9    9    4   15    6    0        0
janitor::tabyl(filtered_seurat@meta.data, predicted.celltype.l3, cell_line)
 predicted.celltype.l3   L1   L2   L3   L4   L5   L6   L7 PBMC PBMC_10x
  B intermediate kappa    0    0    0    1    0    0    0    0        0
 B intermediate lambda    0    0    2   48    4    2    3    0        0
        B memory kappa    0    0    7    1    7   22   55    0        0
       B memory lambda    0    0    3   41    7   29   26    0        0
        B naive lambda    0    0    0   37    0    0    0    0        0
             CD14 Mono    0    0    1   15    9    0   22    0        0
               CD4 CTL    0    0    0    0    0    0    0   14        1
             CD4 Naive    0    0    0    9    0    0    0  526     1524
     CD4 Proliferating 2462 2852 5452 5391 4732 4003 4115    0        6
             CD4 TCM_1 1932    6    6   35    0    7    3 4038     1447
             CD4 TCM_2  652  250  870  516  165  536  482  265      144
             CD4 TCM_3  436   12    4    6   13    7   26  265      359
             CD4 TEM_1    1    0    0    0    0    0    0    4        7
             CD4 TEM_2    0    0    0    0    0    0    0   15       16
             CD4 TEM_3    0    0    0    0    0    0    0   44        1
           CD8 Naive_2    0    0    0    0    1    1    0    0        0
     CD8 Proliferating    0    0    0    0    0    1    1    0        0
             CD8 TCM_1    0    8    0    0    0    0    0    0        0
             CD8 TCM_2  298    6    0    0    0    0    0    0        0
             CD8 TCM_3    0    1    0    0    0    0    0    0        0
             CD8 TEM_2    0    0    0    0    0    1    0    0        0
             CD8 TEM_6    1   10    0    0    1    1    1    0        0
                  cDC1    0    0    0    1   17   36   16    0        0
                cDC2_1    0    0    0    1    1    0    4    0        0
                cDC2_2    0    0    1    3   12    4   33    0        0
                 dnT_2    2    3    0    2    2    6    2    0        0
                  HSPC    0    0   62    7 1036  214  493    0        0
                   ILC    0    0    1    3    0    0    0    0        0
      NK Proliferating   38 2785    6   24   11  259   38    0        0
           Treg Memory    3    2   13    9    4   19   11    0        0

4. filter B cells from L4


# Set identity to cell_line
Idents(filtered_seurat) <- "cell_line"


# Load necessary libraries
library(Seurat)
library(ggplot2)
library(RColorBrewer)

# Identify B cells in L4 to exclude
b_cells_l4 <- WhichCells(filtered_seurat, expression = 
  grepl("^B", filtered_seurat$predicted.celltype.l1) &
  grepl("^B", filtered_seurat$predicted.celltype.l2) &
  grepl("^B", filtered_seurat$predicted.celltype.l3) &
  cell_line == "L4"
)

# Identify cells to keep (excluding B cells in L4)
cells_to_keep <- setdiff(Cells(filtered_seurat), b_cells_l4)

# Subset the Seurat object with selected cells
filtered_seurat <- subset(filtered_seurat, cells = cells_to_keep)

# Plot the filtered data

# Define cell line colors
cell_line_colors <- brewer.pal(length(unique(filtered_seurat$cell_line)), "Set3")

# Generate data frame for plotting
data <- as.data.frame(table(filtered_seurat$cell_line))
colnames(data) <- c("cell_line", "nUMI")

# Create the bar plot
ncells <- ggplot(data, aes(x = cell_line, y = nUMI, fill = cell_line)) +
  geom_col() +
  theme_classic() +
  geom_text(aes(label = nUMI), position = position_dodge(width = 0.9), vjust = -0.25) +
  scale_fill_manual(values = cell_line_colors) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        plot.title = element_text(hjust = 0.5)) +
  ggtitle("Filtered Cells per Sample") +
  xlab("Cell Lines") +
  ylab("Frequency")

print(ncells)

Cell type Distribution to check clusters

# We can apply it later on R obj to get these tables to compare it to decide about resolution.

# Azimuth l1
janitor::tabyl(filtered_seurat@meta.data, predicted.celltype.l1, cell_line)
 predicted.celltype.l1   L1   L2   L3   L4   L5   L6   L7 PBMC PBMC_10x
                     B    0    0   12    4   35   85  121    0        0
                 CD4 T 5771 3124 6351 5967 4914 4575 4634 5171     3505
                 CD8 T   13   25    0    1    4    5    3    0        0
                    DC    0    0    1    4   29   12   41    0        0
                  Mono    0    0    1   13    3    0    5    0        0
                    NK   38 2784    6   25   11  259   38    0        0
                 other    0    0   57    8 1025  208  487    0        0
               other T    3    2    0    1    1    4    2    0        0
# Azimuth l2
janitor::tabyl(filtered_seurat@meta.data, predicted.celltype.l2, cell_line)
 predicted.celltype.l2   L1   L2   L3   L4   L5   L6   L7 PBMC PBMC_10x
        B intermediate    0    0    2    1    2    2    0    0        0
              B memory    0    0   11    1   38   82  120    0        0
             CD14 Mono    0    0    1   14    5    0    6    0        0
               CD4 CTL    0    0    0    0    0    0    0   12        1
             CD4 Naive    0    0    0    7    0    0    0  523     1512
     CD4 Proliferating 2461 2852 5452 5391 4732 4002 4115    0        6
               CD4 TCM 3320  270  887  562  178  557  517 4576     1963
               CD4 TEM    1    0    0    0    0    0    0   60       23
     CD8 Proliferating    0    0    0    0    0    1    1    0        0
               CD8 TCM    1   16    0    0    0    0    0    0        0
               CD8 TEM    1    8    0    0    2    3    1    0        0
                  cDC1    0    0    0    0    2    6    0    0        0
                  cDC2    0    0    0    4   11    3   35    0        0
                   dnT    2    3    0    1    2    5    2    0        0
                  HSPC    0    0   60    7 1035  213  490    0        0
                   ILC    0    0    0    1    0    0    0    0        0
                    NK    0    0    0    1    0    0    0    0        0
      NK Proliferating   38 2785    6   24   11  259   38    0        0
                  Treg    1    1    9    9    4   15    6    0        0
# Azimuth l3
janitor::tabyl(filtered_seurat@meta.data, predicted.celltype.l3, cell_line)
 predicted.celltype.l3   L1   L2   L3   L4   L5   L6   L7 PBMC PBMC_10x
 B intermediate lambda    0    0    2    0    4    2    3    0        0
        B memory kappa    0    0    7    0    7   22   55    0        0
       B memory lambda    0    0    3    1    7   29   26    0        0
             CD14 Mono    0    0    1   15    9    0   22    0        0
               CD4 CTL    0    0    0    0    0    0    0   14        1
             CD4 Naive    0    0    0    9    0    0    0  526     1524
     CD4 Proliferating 2462 2852 5452 5391 4732 4003 4115    0        6
             CD4 TCM_1 1932    6    6   35    0    7    3 4038     1447
             CD4 TCM_2  652  250  870  516  165  536  482  265      144
             CD4 TCM_3  436   12    4    6   13    7   26  265      359
             CD4 TEM_1    1    0    0    0    0    0    0    4        7
             CD4 TEM_2    0    0    0    0    0    0    0   15       16
             CD4 TEM_3    0    0    0    0    0    0    0   44        1
           CD8 Naive_2    0    0    0    0    1    1    0    0        0
     CD8 Proliferating    0    0    0    0    0    1    1    0        0
             CD8 TCM_1    0    8    0    0    0    0    0    0        0
             CD8 TCM_2  298    6    0    0    0    0    0    0        0
             CD8 TCM_3    0    1    0    0    0    0    0    0        0
             CD8 TEM_2    0    0    0    0    0    1    0    0        0
             CD8 TEM_6    1   10    0    0    1    1    1    0        0
                  cDC1    0    0    0    1   17   36   16    0        0
                cDC2_1    0    0    0    1    1    0    4    0        0
                cDC2_2    0    0    1    3   12    4   33    0        0
                 dnT_2    2    3    0    2    2    6    2    0        0
                  HSPC    0    0   62    7 1036  214  493    0        0
                   ILC    0    0    1    3    0    0    0    0        0
      NK Proliferating   38 2785    6   24   11  259   38    0        0
           Treg Memory    3    2   13    9    4   19   11    0        0

5. filter ILC and NK just one cells just keep CD4 T cells



# Set identity to cell_line 
Idents(filtered_seurat) <- "cell_line"



# Remove ILC and NK cells based on predicted.celltype.l2
filtered_seurat <- subset(filtered_seurat, subset = predicted.celltype.l2 != "ILC" & predicted.celltype.l2 != "NK")




library(ggplot2)
library(RColorBrewer)  

# Assuming you have 10 different cell lines, generating a color palette with 10 colors
cell_line_colors <- brewer.pal(10, "Set3")

# Assuming filtered_seurat$cell_line is a factor or character vector containing cell line names
data <- as.data.frame(table(filtered_seurat$cell_line))
colnames(data) <- c("cell_line", "nUMI")  # Change column name to nUMI

ncells <- ggplot(data, aes(x = cell_line, y = nUMI, fill = cell_line)) + 
  geom_col() +
  theme_classic() +
  geom_text(aes(label = nUMI), 
            position = position_dodge(width = 0.9), 
            vjust = -0.25) +
  scale_fill_manual(values = cell_line_colors) + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        plot.title = element_text(hjust = 0.5)) +  # Adjust the title position
  ggtitle("Filtered cells per sample") +
  xlab("Cell lines") +  # Adjust x-axis label
  ylab("Frequency")    # Adjust y-axis label

print(ncells)

NA
NA
NA

Cell type Distribution to check clusters


# We can apply it later on R obj to get these tables to compare it to decide about resolution.

# Azimuth l1
janitor::tabyl(filtered_seurat@meta.data, predicted.celltype.l1, cell_line)
 predicted.celltype.l1   L1   L2   L3   L4   L5   L6   L7 PBMC PBMC_10x
                     B    0    0   12    4   35   85  121    0        0
                 CD4 T 5771 3124 6351 5967 4914 4575 4634 5171     3505
                 CD8 T   13   25    0    1    4    5    3    0        0
                    DC    0    0    1    4   29   12   41    0        0
                  Mono    0    0    1   13    3    0    5    0        0
                    NK   38 2784    6   24   11  259   38    0        0
                 other    0    0   57    7 1025  208  487    0        0
               other T    3    2    0    1    1    4    2    0        0
janitor::tabyl(filtered_seurat@meta.data, predicted.celltype.l2, cell_line)
 predicted.celltype.l2   L1   L2   L3   L4   L5   L6   L7 PBMC PBMC_10x
        B intermediate    0    0    2    1    2    2    0    0        0
              B memory    0    0   11    1   38   82  120    0        0
             CD14 Mono    0    0    1   14    5    0    6    0        0
               CD4 CTL    0    0    0    0    0    0    0   12        1
             CD4 Naive    0    0    0    7    0    0    0  523     1512
     CD4 Proliferating 2461 2852 5452 5391 4732 4002 4115    0        6
               CD4 TCM 3320  270  887  562  178  557  517 4576     1963
               CD4 TEM    1    0    0    0    0    0    0   60       23
     CD8 Proliferating    0    0    0    0    0    1    1    0        0
               CD8 TCM    1   16    0    0    0    0    0    0        0
               CD8 TEM    1    8    0    0    2    3    1    0        0
                  cDC1    0    0    0    0    2    6    0    0        0
                  cDC2    0    0    0    4   11    3   35    0        0
                   dnT    2    3    0    1    2    5    2    0        0
                  HSPC    0    0   60    7 1035  213  490    0        0
      NK Proliferating   38 2785    6   24   11  259   38    0        0
                  Treg    1    1    9    9    4   15    6    0        0
janitor::tabyl(filtered_seurat@meta.data, predicted.celltype.l3, cell_line)
 predicted.celltype.l3   L1   L2   L3   L4   L5   L6   L7 PBMC PBMC_10x
 B intermediate lambda    0    0    2    0    4    2    3    0        0
        B memory kappa    0    0    7    0    7   22   55    0        0
       B memory lambda    0    0    3    1    7   29   26    0        0
             CD14 Mono    0    0    1   14    9    0   22    0        0
               CD4 CTL    0    0    0    0    0    0    0   14        1
             CD4 Naive    0    0    0    9    0    0    0  526     1524
     CD4 Proliferating 2462 2852 5452 5391 4732 4003 4115    0        6
             CD4 TCM_1 1932    6    6   35    0    7    3 4038     1447
             CD4 TCM_2  652  250  870  516  165  536  482  265      144
             CD4 TCM_3  436   12    4    6   13    7   26  265      359
             CD4 TEM_1    1    0    0    0    0    0    0    4        7
             CD4 TEM_2    0    0    0    0    0    0    0   15       16
             CD4 TEM_3    0    0    0    0    0    0    0   44        1
           CD8 Naive_2    0    0    0    0    1    1    0    0        0
     CD8 Proliferating    0    0    0    0    0    1    1    0        0
             CD8 TCM_1    0    8    0    0    0    0    0    0        0
             CD8 TCM_2  298    6    0    0    0    0    0    0        0
             CD8 TCM_3    0    1    0    0    0    0    0    0        0
             CD8 TEM_2    0    0    0    0    0    1    0    0        0
             CD8 TEM_6    1   10    0    0    1    1    1    0        0
                  cDC1    0    0    0    1   17   36   16    0        0
                cDC2_1    0    0    0    1    1    0    4    0        0
                cDC2_2    0    0    1    3   12    4   33    0        0
                 dnT_2    2    3    0    2    2    6    2    0        0
                  HSPC    0    0   62    7 1036  214  493    0        0
                   ILC    0    0    1    2    0    0    0    0        0
      NK Proliferating   38 2785    6   24   11  259   38    0        0
           Treg Memory    3    2   13    9    4   19   11    0        0

6. filter cells just keep CD4 T cells



# Load necessary libraries
library(Seurat)
library(ggplot2)
library(RColorBrewer)

# Set identity to cell_line
Idents(filtered_seurat) <- "cell_line"

# Identify CD14 Monocytes in L4 to exclude
cd14_mono_l4 <- WhichCells(filtered_seurat, expression = 
  grepl("CD14 Mono", filtered_seurat$predicted.celltype.l2) &
  grepl("CD14 Mono", filtered_seurat$predicted.celltype.l3) &
  cell_line == "L4"
)

# Identify cells to keep (excluding CD14 Monocytes in L4)
cells_to_keep <- setdiff(Cells(filtered_seurat), cd14_mono_l4)

# Subset the Seurat object with selected cells
filtered_seurat <- subset(filtered_seurat, cells = cells_to_keep)
library(ggplot2)
library(RColorBrewer)  

# Assuming you have 10 different cell lines, generating a color palette with 10 colors
cell_line_colors <- brewer.pal(10, "Set3")

# Assuming filtered_seurat$cell_line is a factor or character vector containing cell line names
data <- as.data.frame(table(filtered_seurat$cell_line))
colnames(data) <- c("cell_line", "nUMI")  # Change column name to nUMI

ncells <- ggplot(data, aes(x = cell_line, y = nUMI, fill = cell_line)) + 
  geom_col() +
  theme_classic() +
  geom_text(aes(label = nUMI), 
            position = position_dodge(width = 0.9), 
            vjust = -0.25) +
  scale_fill_manual(values = cell_line_colors) + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        plot.title = element_text(hjust = 0.5)) +  # Adjust the title position
  ggtitle("Filtered cells per sample") +
  xlab("Cell lines") +  # Adjust x-axis label
  ylab("Frequency")    # Adjust y-axis label

print(ncells)

NA
NA
NA

Cell type Distribution to check clusters


# We can apply it later on R obj to get these tables to compare it to decide about resolution.

# Azimuth l1
janitor::tabyl(filtered_seurat@meta.data, predicted.celltype.l1, cell_line)
 predicted.celltype.l1   L1   L2   L3   L4   L5   L6   L7 PBMC PBMC_10x
                     B    0    0   12    3   35   85  121    0        0
                 CD4 T 5771 3124 6351 5967 4914 4575 4634 5171     3505
                 CD8 T   13   25    0    1    4    5    3    0        0
                    DC    0    0    1    4   29   12   41    0        0
                  Mono    0    0    1    0    3    0    5    0        0
                    NK   38 2784    6   24   11  259   38    0        0
                 other    0    0   57    7 1025  208  487    0        0
               other T    3    2    0    1    1    4    2    0        0
janitor::tabyl(filtered_seurat@meta.data, predicted.celltype.l2, cell_line)
 predicted.celltype.l2   L1   L2   L3   L4   L5   L6   L7 PBMC PBMC_10x
        B intermediate    0    0    2    1    2    2    0    0        0
              B memory    0    0   11    1   38   82  120    0        0
             CD14 Mono    0    0    1    0    5    0    6    0        0
               CD4 CTL    0    0    0    0    0    0    0   12        1
             CD4 Naive    0    0    0    7    0    0    0  523     1512
     CD4 Proliferating 2461 2852 5452 5391 4732 4002 4115    0        6
               CD4 TCM 3320  270  887  562  178  557  517 4576     1963
               CD4 TEM    1    0    0    0    0    0    0   60       23
     CD8 Proliferating    0    0    0    0    0    1    1    0        0
               CD8 TCM    1   16    0    0    0    0    0    0        0
               CD8 TEM    1    8    0    0    2    3    1    0        0
                  cDC1    0    0    0    0    2    6    0    0        0
                  cDC2    0    0    0    4   11    3   35    0        0
                   dnT    2    3    0    1    2    5    2    0        0
                  HSPC    0    0   60    7 1035  213  490    0        0
      NK Proliferating   38 2785    6   24   11  259   38    0        0
                  Treg    1    1    9    9    4   15    6    0        0
janitor::tabyl(filtered_seurat@meta.data, predicted.celltype.l3, cell_line)
 predicted.celltype.l3   L1   L2   L3   L4   L5   L6   L7 PBMC PBMC_10x
 B intermediate lambda    0    0    2    0    4    2    3    0        0
        B memory kappa    0    0    7    0    7   22   55    0        0
       B memory lambda    0    0    3    1    7   29   26    0        0
             CD14 Mono    0    0    1    0    9    0   22    0        0
               CD4 CTL    0    0    0    0    0    0    0   14        1
             CD4 Naive    0    0    0    9    0    0    0  526     1524
     CD4 Proliferating 2462 2852 5452 5391 4732 4003 4115    0        6
             CD4 TCM_1 1932    6    6   35    0    7    3 4038     1447
             CD4 TCM_2  652  250  870  516  165  536  482  265      144
             CD4 TCM_3  436   12    4    6   13    7   26  265      359
             CD4 TEM_1    1    0    0    0    0    0    0    4        7
             CD4 TEM_2    0    0    0    0    0    0    0   15       16
             CD4 TEM_3    0    0    0    0    0    0    0   44        1
           CD8 Naive_2    0    0    0    0    1    1    0    0        0
     CD8 Proliferating    0    0    0    0    0    1    1    0        0
             CD8 TCM_1    0    8    0    0    0    0    0    0        0
             CD8 TCM_2  298    6    0    0    0    0    0    0        0
             CD8 TCM_3    0    1    0    0    0    0    0    0        0
             CD8 TEM_2    0    0    0    0    0    1    0    0        0
             CD8 TEM_6    1   10    0    0    1    1    1    0        0
                  cDC1    0    0    0    1   17   36   16    0        0
                cDC2_1    0    0    0    1    1    0    4    0        0
                cDC2_2    0    0    1    3   12    4   33    0        0
                 dnT_2    2    3    0    2    2    6    2    0        0
                  HSPC    0    0   62    7 1036  214  493    0        0
                   ILC    0    0    1    2    0    0    0    0        0
      NK Proliferating   38 2785    6   24   11  259   38    0        0
           Treg Memory    3    2   13    9    4   19   11    0        0

7. Save the Seurat object as an Robj file

