Overview

This script reintegrates the non-leukaemia immune cell populations from the B-ALL scRNAseq dataset. Three objects are used:

  • no_leuk_obj_anno (base object, 05_srt_no_leukaemia_clustered_scType_annotated.qs) — all 19,040 non-leukaemia cells with scType automated annotations in $scType_classification. Note: the $annotations column in this object contains logical FALSE values from an earlier filtering step and is overwritten with scType labels at load time.
  • no_leuk_gdt (10_gdTcells_clustered.qs) — γδ T cell subset with refined annotations (Vγ6+Vδ4+ vs γδ_like), functional bias scores (AntiTumorScore1, ProTumorScore1), and cluster_class
  • no_leuk_cd8 (10_Teff_Tex_Tpex_subset.qs) — CD8 T cell subset with refined annotations (CD8_Tex, CD8_Tpex, T_eff)

Important note on batch correction: Unlike the leukaemia cells, which were transplanted from three donor sources (AD4_EM3, AD2.2_EM4, AD5_EM4) and required Harmony correction for transplant source, the immune cells are the endogenous immune system of each recipient mouse. They are entirely independent of transplant source. The relevant grouping variable for any batch consideration is Mouse_ID (six individual mice, each an independent biological replicate). Whether Harmony correction is needed at all is assessed from the standard PCA/UMAP — if Mouse_ID effects are not driving structure in the UMAP, no correction is applied.

The annotation pipeline: (1) doublets removed via scDblFinder; (2) BM macrophages and CNS microglia-like macrophages separated by marker-based scoring; (3) refined CD8 and γδ annotations transferred by barcode matching; (4) all γδ T cells reclassified by V gene expression from the count matrix — cells with no detectable V gene but CD3+/Trdc−/Eomes+ profile reclassified as NKT-like (388 cells), remaining γδ consolidated into three biologically meaningful groups (Vγ6Vδ4, Vγ4, Other γδ); (5) ambiguous populations assessed before exclusion decisions; (6) unified cell_annotation column built; (7) revised pro-/anti-tumour γδ scoring computed from literature-grounded signatures; (8) standard UMAP computed with optional Harmony correction by Mouse_ID if individual mouse effects are visible; (9) downstream analyses of IL-17 axis, myeloid polarisation, CD8 exhaustion, niche support, and checkpoint landscape.

Key references informing this analysis:

  • Wiesheu & Coffelt (2024) Cancer Cell — γδ T cell subsets in cancer: Vγ4/Vγ6 pro-tumorigenic biology, PD-1/TIM-3 differential expression, OXPHOS metabolism, AREG co-expression
  • Edwards et al. (2023) JEM — PD-1 and TIM-3 differentially regulate Vγ4/Vγ6 IL-17A+ γδ T cells
  • Posner et al. (2022) Curr Opin Immunol — Meningeal humoral immunity: IgA plasma cells at CNS borders, CXCL12/IL-7 niche support for developing B cells in dura
  • Lopes et al. (2021) Nat Immunol — Metabolic programmes of γδ T cell subsets (OXPHOS vs glycolysis)
  • Reis et al. (2022) Science — TCR Vγδ usage distinguishes pro- from anti-tumour intestinal γδ T cells

Working hypothesis: CNS-specific Vγ6Vδ4 T cells are pro-leukaemia, through direct interactions with leukaemia and suppression of cytotoxic T cell activity in the niche.

Setup

library(Seurat)
library(ggplot2)
library(patchwork)
library(dplyr)
library(knitr)
library(qs2)
library(viridis)
library(tidyr)
library(tibble)
library(scales)
library(harmony)
library(ggrepel)

tissue_cols <- c("BM" = "steelblue", "CNS" = "firebrick")

# Six individual mice — colour palette for Mouse_ID
mouse_cols <- c(
  "1838161" = "#E69F00",
  "1838162" = "#56B4E9",
  "1838163" = "#009E73",
  "1838171" = "#F0E442",
  "1838279" = "#CC79A7",
  "1838280" = "#D55E00"
)
base_path <- "/exports/eddie/scratch/aduguid3/uploaded_data"

no_leuk_obj_anno <- qs_read(file.path(base_path,
  "05_srt_no_leukaemia_clustered_scType_annotated.qs"))
no_leuk_gdt      <- qs_read(file.path(base_path,
  "10_gdTcells_clustered.qs"))
no_leuk_cd8      <- qs_read(file.path(base_path,
  "10_Teff_Tex_Tpex_subset.qs"))

cat("Base object cells:", ncol(no_leuk_obj_anno), "\n")
## Base object cells: 19040
cat("γδ T cell cells:  ", ncol(no_leuk_gdt),      "\n")
## γδ T cell cells:   358
cat("CD8 T cell cells: ", ncol(no_leuk_cd8),      "\n")
## CD8 T cell cells:  499
cat("\nBase object tissues:\n")
## 
## Base object tissues:
print(table(no_leuk_obj_anno$Tissue))
## 
##    BM   CNS 
## 13799  5241
cat("\nCells per mouse:\n")
## 
## Cells per mouse:
print(table(no_leuk_obj_anno$Mouse_ID, no_leuk_obj_anno$Tissue))
##          
##             BM  CNS
##   1838161  373  923
##   1838162  593  732
##   1838163  403  535
##   1838171 4662 1144
##   1838279 5173  756
##   1838280 2595 1151
# $annotations contains logical FALSE — overwrite with scType labels
cat("\nscType_classification distribution:\n")
## 
## scType_classification distribution:
print(sort(table(no_leuk_obj_anno$scType_classification), decreasing = TRUE))
## 
##                                  Macrophages 
##                                         3625 
##                                  Pre-B cells 
##                                         3159 
##                           Naive CD8+ T cells 
##                                         1798 
##                                   γδ-T cells 
##                                         1594 
##                                    Platelets 
##                                         1512 
##                                  Neutrophils 
##                                         1200 
##                        Natural killer  cells 
##                                          999 
##                               Plasma B cells 
##                                          901 
##                        Effector CD4+ T cells 
##                                          872 
##                             Progenitor cells 
##                                          753 
##                      Non-classical monocytes 
##                                          616 
##                                    Basophils 
##                                          588 
##                        Effector CD8+ T cells 
##                                          548 
## Erythroid-like and erythroid precursor cells 
##                                          315 
##                          CD8+ NKT-like cells 
##                                          308 
##                                Naive B cells 
##                                          129 
##                      Myeloid Dendritic cells 
##                                          123
no_leuk_obj_anno$annotations <- as.character(
  no_leuk_obj_anno$scType_classification)

cat("\nannotations set from scType_classification:",
    length(unique(no_leuk_obj_anno$annotations)), "unique values\n")
## 
## annotations set from scType_classification: 17 unique values

Step 1 — Doublet Check and Removal

cat("scDblFinder.class distribution:\n")
## scDblFinder.class distribution:
print(table(no_leuk_obj_anno$scDblFinder.class, useNA = "ifany"))
## 
## singlet doublet 
##   17990    1050
n_before <- ncol(no_leuk_obj_anno)

if ("doublet" %in% no_leuk_obj_anno$scDblFinder.class) {
  no_leuk_obj_anno <- subset(no_leuk_obj_anno,
                              subset = scDblFinder.class == "singlet")
  cat("\nDoublets removed via scDblFinder.class\n")
} else {
  cat("\nNo doublets detected — object already filtered or none present\n")
}
## 
## Doublets removed via scDblFinder.class
cat("Cells before:", n_before, "\n")
## Cells before: 19040
cat("Cells after: ", ncol(no_leuk_obj_anno), "\n")
## Cells after:  17990
cat("Removed:     ", n_before - ncol(no_leuk_obj_anno), "\n")
## Removed:      1050

Step 2 — Macrophage Refinement

Macrophages classified by scType are refined into two subpopulations using marker-based scoring. Microglia markers (Cx3cr1, P2ry12, Tmem119, Sall1, Fcrls, Olfml3, Hexb) and BM macrophage markers (Csf1r, Adgre1, Cd68, Mrc1, Ccl2) are scored onto all cells via AddModuleScore. Macrophages scoring higher on the microglia signature are labelled “Microglia-like macrophages”.

These cells are CNS border-associated macrophages (likely meningeal or perivascular BAMs) rather than true parenchymal microglia. The term “Microglia-like macrophages” reflects this ambiguity — BAMs express canonical microglia markers but are ontogenetically and transcriptionally distinct from homeostatic parenchymal microglia.

micro_markers    <- c("Cx3cr1", "P2ry12", "Tmem119", "Sall1",
                       "Fcrls",  "Olfml3", "Hexb")
bm_macro_markers <- c("Csf1r", "Adgre1", "Cd68", "Mrc1", "Ccl2")

no_leuk_obj_anno <- AddModuleScore(no_leuk_obj_anno,
                                    features = list(micro_markers),
                                    name     = "Microglia_sig",
                                    seed     = 42)

no_leuk_obj_anno <- AddModuleScore(no_leuk_obj_anno,
                                    features = list(bm_macro_markers),
                                    name     = "BMMacro_sig",
                                    seed     = 42)

no_leuk_obj_anno$macro_refined <- ifelse(
  no_leuk_obj_anno$scType_classification == "Macrophages" &
    no_leuk_obj_anno$Microglia_sig1 > no_leuk_obj_anno$BMMacro_sig1,
  "Microglia-like macrophages",
  ifelse(
    no_leuk_obj_anno$scType_classification == "Macrophages",
    "Macrophages",
    NA_character_
  )
)

cat("Refined macrophage labels × Tissue:\n")
## Refined macrophage labels × Tissue:
print(table(no_leuk_obj_anno$macro_refined,
            no_leuk_obj_anno$Tissue, useNA = "ifany"))
##                             
##                                 BM   CNS
##   Macrophages                 1127   904
##   Microglia-like macrophages    64  1476
##   <NA>                       11737  2682
no_leuk_obj_anno@meta.data %>%
  filter(scType_classification == "Macrophages") %>%
  group_by(macro_refined) %>%
  summarise(
    n              = n(),
    mean_micro_sig = round(mean(Microglia_sig1), 4),
    mean_bm_sig    = round(mean(BMMacro_sig1),   4),
    pct_CNS        = round(mean(Tissue == "CNS") * 100, 1),
    .groups        = "drop"
  ) %>%
  kable(caption = "Macrophage refinement — score summary and tissue distribution")
Macrophage refinement — score summary and tissue distribution
macro_refined n mean_micro_sig mean_bm_sig pct_CNS
Macrophages 2031 0.3342 1.9996 44.5
Microglia-like macrophages 1540 3.5698 1.1658 95.8

Step 3 — Transfer Refined Annotations From Subset Objects

Refined γδ T cell and CD8 T cell annotations are transferred from subset objects by exact barcode matching. The γδ subset (no_leuk_gdt, 358 cells) contains fewer cells than the scType γδ-T cells population — this discrepancy is resolved in Step 4 by V gene reclassification of all γδ T cells. The old bias scores are retained for comparison but replaced by revised literature-grounded signatures in Step 5b.

gdt_meta <- no_leuk_gdt@meta.data %>%
  rownames_to_column("barcode") %>%
  select(barcode,
         gdt_annotation  = annotations,
         gdt_bias        = gamma_delta_bias,
         gdt_class       = cluster_class,
         AntiTumorScore1,
         ProTumorScore1)

base_barcodes <- rownames(no_leuk_obj_anno@meta.data)
gdt_match     <- match(base_barcodes, gdt_meta$barcode)

no_leuk_obj_anno$gdt_annotation  <- gdt_meta$gdt_annotation[gdt_match]
no_leuk_obj_anno$gdt_bias        <- gdt_meta$gdt_bias[gdt_match]
no_leuk_obj_anno$gdt_class       <- gdt_meta$gdt_class[gdt_match]
no_leuk_obj_anno$AntiTumorScore1 <- gdt_meta$AntiTumorScore1[gdt_match]
no_leuk_obj_anno$ProTumorScore1  <- gdt_meta$ProTumorScore1[gdt_match]

cat("γδ barcodes matched:", sum(!is.na(no_leuk_obj_anno$gdt_annotation)),
    "of", nrow(gdt_meta), "\n")
## γδ barcodes matched: 357 of 358
cd8_meta <- no_leuk_cd8@meta.data %>%
  rownames_to_column("barcode") %>%
  select(barcode, cd8_annotation = annotations)

cd8_match <- match(base_barcodes, cd8_meta$barcode)

no_leuk_obj_anno$cd8_annotation <- cd8_meta$cd8_annotation[cd8_match]

cat("CD8 barcodes matched:", sum(!is.na(no_leuk_obj_anno$cd8_annotation)),
    "of", nrow(cd8_meta), "\n")
## CD8 barcodes matched: 497 of 499

Step 4 — Population QC and γδ Reclassification

Pre-B Cells

Pre-B cells — assessed to distinguish normal B cell progenitors from leukaemia escapees. High Cd79a and Cd19 confirm B lineage. Rag1 confirms active VDJ recombination. Cd34 and Dntt essentially absent — rules out leukaemia identity. Decision: retain.

preb_cells <- subset(no_leuk_obj_anno,
                      subset = scType_classification == "Pre-B cells")
cat("Pre-B cells:", ncol(preb_cells), "\n")
## Pre-B cells: 3098
print(table(preb_cells$Tissue))
## 
##   BM  CNS 
## 2893  205
preb_markers <- c("Cd19","Vpreb1","Igll1","Cd79a",
                   "Cd34","Flt3","Il7r",
                   "Dntt","Rag1","Rag2",
                   "Cd24","Cd43","Cd25")
preb_found <- preb_markers[preb_markers %in% rownames(preb_cells)]

sapply(preb_found, function(g) {
  round(mean(GetAssayData(preb_cells, layer = "counts")[g, ] > 0) * 100, 2)
}) %>%
  sort(decreasing = TRUE) %>%
  data.frame(pct_detected = .) %>%
  rownames_to_column("gene") %>%
  kable(caption = "Pre-B cell marker detection rates")
Pre-B cell marker detection rates
gene pct_detected
Cd79a 99.71
Cd19 83.21
Il7r 18.62
Rag1 12.30
Rag2 3.58
Flt3 2.26
Igll1 0.58
Dntt 0.52
Cd34 0.10

Platelets — Reclassified as Stroma

ScType classified cells as Platelets. The equal tissue split is biologically implausible for true platelets. Marker assessment confirmed stromal/endothelial identity. Decision: relabel as Stroma and retain — relevant for downstream cell-cell interaction analysis.

plt_cells <- subset(no_leuk_obj_anno,
                     subset = scType_classification == "Platelets")
cat("Platelet cells:", ncol(plt_cells), "\n")
## Platelet cells: 1442
print(table(plt_cells$Tissue))
## 
##  BM CNS 
## 719 723
plt_markers <- c("Ppbp","Pf4","Itga2b","Vwf",
                  "Col1a1","Col3a1","Pdgfra","Fn1",
                  "Pecam1","Cdh5","Eng",
                  "Acta2","Tagln")
plt_found <- plt_markers[plt_markers %in% rownames(plt_cells)]

sapply(plt_found, function(g) {
  round(mean(GetAssayData(plt_cells, layer = "counts")[g, ] > 0) * 100, 2)
}) %>%
  sort(decreasing = TRUE) %>%
  data.frame(pct_detected = .) %>%
  rownames_to_column("gene") %>%
  kable(caption = "Platelet/stromal marker detection — scType 'Platelets' cells")
Platelet/stromal marker detection — scType ‘Platelets’ cells
gene pct_detected
Col1a1 54.79
Fn1 47.99
Col3a1 47.30
Pdgfra 36.96
Eng 27.67
Pecam1 24.62
Cdh5 23.93
Itga2b 20.60
Tagln 10.06
Pf4 8.53
Vwf 8.18
Acta2 2.98
Ppbp 1.60

Erythroid-like Cells

BM-restricted, consistent with sort impurity. Non-immune cells not relevant to immune microenvironment analysis. Decision: exclude.

ery_cells <- subset(no_leuk_obj_anno,
                     subset = scType_classification ==
                       "Erythroid-like and erythroid precursor cells")
cat("Erythroid cells:", ncol(ery_cells), "\n")
## Erythroid cells: 291
print(table(ery_cells$Tissue))
## 
##  BM CNS 
## 289   2
ery_markers <- c("Hba-a1","Hbb-bt","Gypa","Klf1","Gata1","Epor","Tfrc")
ery_found   <- ery_markers[ery_markers %in% rownames(ery_cells)]

sapply(ery_found, function(g) {
  round(mean(GetAssayData(ery_cells, layer = "counts")[g, ] > 0) * 100, 2)
}) %>%
  sort(decreasing = TRUE) %>%
  data.frame(pct_detected = .) %>%
  rownames_to_column("gene") %>%
  kable(caption = "Erythroid marker detection rates")
Erythroid marker detection rates
gene pct_detected
Klf1 87.63
Tfrc 86.25
Gypa 84.54
Gata1 83.51
Hba-a1 80.76
Hbb-bt 73.20
Epor 73.20

γδ T Cell Reclassification by V Gene Expression

All γδ T cells are reclassified using V gene expression directly from the count matrix. This is more robust than barcode matching as it provides biologically meaningful Vγ/Vδ labels for every γδ T cell.

Labels consolidated into three biologically meaningful groups informed by Wiesheu & Coffelt (2024):

  • Vγ6Vδ4: CNS-resident IL-17 producers — the dominant pro-tumorigenic subset at tissue barriers
  • Vγ4: The second major pro-tumorigenic subset; contains both IL-17A+ (pro-tumour, CD27−) and IFNγ+ (anti-tumour, CD27+) subpopulations
  • Other γδ: All remaining γδ T cells (Vγ1, Vγ2, Vγ3, Vγ5, Vγ7, unresolved) — predominantly IFNγ-producing subsets
trgv_genes <- c("Trgv1","Trgv2","Trgv3","Trgv4","Trgv5","Trgv6","Trgv7")
trdv_genes <- c("Trdv1","Trdv2","Trdv3","Trdv4","Trdv5")

trgv_found <- trgv_genes[trgv_genes %in% rownames(no_leuk_obj_anno)]
trdv_found <- trdv_genes[trdv_genes %in% rownames(no_leuk_obj_anno)]

cat("Trgv genes found:", paste(trgv_found, collapse = ", "), "\n")
## Trgv genes found: Trgv1, Trgv2, Trgv3, Trgv4, Trgv5, Trgv6, Trgv7
cat("Trdv genes found:", paste(trdv_found, collapse = ", "), "\n")
## Trdv genes found: Trdv1, Trdv3, Trdv4, Trdv5
gdt_all   <- subset(no_leuk_obj_anno,
                     subset = scType_classification == "γδ-T cells")
count_mat <- GetAssayData(gdt_all, layer = "counts")

trgv_max <- if (length(trgv_found) > 0) {
  apply(count_mat[trgv_found, , drop = FALSE], 2, function(x) {
    if (max(x) > 0) trgv_found[which.max(x)] else NA_character_
  })
} else rep(NA_character_, ncol(gdt_all))

trdv_max <- if (length(trdv_found) > 0) {
  apply(count_mat[trdv_found, , drop = FALSE], 2, function(x) {
    if (max(x) > 0) trdv_found[which.max(x)] else NA_character_
  })
} else rep(NA_character_, ncol(gdt_all))

vgene_label <- case_when(
  !is.na(trgv_max) & !is.na(trdv_max) ~
    paste0("Vγ", gsub("Trgv","",trgv_max), "Vδ", gsub("Trdv","",trdv_max)),
  !is.na(trgv_max) & is.na(trdv_max)  ~
    paste0("Vγ", gsub("Trgv","",trgv_max), "Vδ?"),
  is.na(trgv_max)  & !is.na(trdv_max) ~
    paste0("Vγ?Vδ",  gsub("Trdv","",trdv_max)),
  TRUE ~ "γδ-T cells (no V gene detected)"
)

cat("\nV gene distribution:\n")
## 
## V gene distribution:
print(sort(table(vgene_label), decreasing = TRUE))
## vgene_label
## γδ-T cells (no V gene detected)                          Vγ?Vδ4 
##                             388                             153 
##                          Vγ2Vδ?                          Vγ6Vδ4 
##                             139                             121 
##                          Vγ1Vδ?                          Vγ4Vδ? 
##                              98                              66 
##                          Vγ5Vδ?                          Vγ5Vδ4 
##                              39                              24 
##                          Vγ2Vδ4                          Vγ1Vδ4 
##                              22                              17 
##                          Vγ4Vδ4                          Vγ2Vδ5 
##                               5                               4 
##                          Vγ3Vδ?                          Vγ7Vδ? 
##                               3                               3 
##                          Vγ6Vδ?                          Vγ?Vδ5 
##                               2                               1 
##                          Vγ1Vδ1                          Vγ4Vδ5 
##                               1                               1 
##                          Vγ7Vδ4 
##                               1
cat("\nV gene labels × Tissue:\n")
## 
## V gene labels × Tissue:
print(table(vgene_label, gdt_all$Tissue))
##                                  
## vgene_label                        BM CNS
##   Vγ?Vδ4                           61  92
##   Vγ?Vδ5                            1   0
##   Vγ1Vδ?                           61  37
##   Vγ1Vδ1                            0   1
##   Vγ1Vδ4                            6  11
##   Vγ2Vδ?                           87  52
##   Vγ2Vδ4                            1  21
##   Vγ2Vδ5                            4   0
##   Vγ3Vδ?                            1   2
##   Vγ4Vδ?                           39  27
##   Vγ4Vδ4                            1   4
##   Vγ4Vδ5                            0   1
##   Vγ5Vδ?                           20  19
##   Vγ5Vδ4                            9  15
##   Vγ6Vδ?                            0   2
##   Vγ6Vδ4                            2 119
##   Vγ7Vδ?                            2   1
##   Vγ7Vδ4                            0   1
##   γδ-T cells (no V gene detected) 219 169
# Store in staging column — applied to cell_annotation in Step 5
vgene_vec    <- setNames(vgene_label, colnames(gdt_all))
gdt_main_idx <- rownames(no_leuk_obj_anno@meta.data) %in% names(vgene_vec)
no_leuk_obj_anno$vgene_label <- NA_character_
no_leuk_obj_anno$vgene_label[gdt_main_idx] <-
  vgene_vec[rownames(no_leuk_obj_anno@meta.data)[gdt_main_idx]]

cat("\nvgene_label assigned to",
    sum(!is.na(no_leuk_obj_anno$vgene_label)), "cells\n")
## 
## vgene_label assigned to 1088 cells

QC: Are “No V Gene Detected” Cells Truly γδ T Cells?

388 cells classified as γδ-T cells by scType have no detectable Trgv or Trdv expression. NK cells and γδ T cells share many markers (NKG2D, granzymes, IFNγ), which confuses automated classifiers. Key discriminators: γδ T cells are CD3+/Trdc+; NK cells are CD3−/Ncr1+. Cells that are CD3+ but Trdc−/Eomes+ are likely NKT-like cells misclassified by scType.

discrim_genes <- c("Cd3e", "Cd3d", "Cd3g",        # pan-T / CD3 complex
                    "Trdc", "Trgc1", "Trgc2",      # γδ constant regions
                    "Ncr1", "Klrb1c", "Klrd1",     # NK markers
                    "Nkg7", "Klrk1", "Eomes",       # shared / NK-enriched
                    "Il7r", "Thy1")                  # T cell lineage

discrim_found <- discrim_genes[discrim_genes %in% rownames(no_leuk_obj_anno)]

no_leuk_obj_anno$gdt_qc_group <- case_when(
  no_leuk_obj_anno$scType_classification == "γδ-T cells" &
    !is.na(no_leuk_obj_anno$vgene_label) &
    no_leuk_obj_anno$vgene_label != "γδ-T cells (no V gene detected)" ~
    "γδ (V gene confirmed)",
  no_leuk_obj_anno$scType_classification == "γδ-T cells" &
    !is.na(no_leuk_obj_anno$vgene_label) &
    no_leuk_obj_anno$vgene_label == "γδ-T cells (no V gene detected)" ~
    "γδ (no V gene)",
  no_leuk_obj_anno$scType_classification == "Natural killer  cells" ~
    "NK cells",
  TRUE ~ NA_character_
)

cat("QC groups:\n")
## QC groups:
print(table(no_leuk_obj_anno$gdt_qc_group, useNA = "ifany"))
## 
##              NK cells        γδ (no V gene) γδ (V gene confirmed) 
##                   979                   388                   700 
##                  <NA> 
##                 15923
qc_obj <- subset(no_leuk_obj_anno, subset = !is.na(gdt_qc_group))
## Warning: Removing 15923 cells missing data for vars requested
Idents(qc_obj) <- "gdt_qc_group"

DotPlot(qc_obj,
        features  = discrim_found,
        cols      = c("lightgrey", "firebrick"),
        dot.scale = 7) +
  RotatedAxis() +
  ggtitle("γδ T cell vs NK identity — diagnostic markers")
## Warning: Scaling data with a low number of groups may produce misleading
## results

sapply(discrim_found, function(g) {
  tapply(GetAssayData(qc_obj, layer = "counts")[g, ],
         qc_obj$gdt_qc_group,
         function(x) round(mean(x > 0) * 100, 1))
}) %>%
  as.data.frame() %>%
  rownames_to_column("group") %>%
  kable(caption = "Marker detection rates (%) — γδ confirmed vs no V gene vs NK")
Marker detection rates (%) — γδ confirmed vs no V gene vs NK
group Cd3e Cd3d Cd3g Trdc Trgc1 Trgc2 Ncr1 Klrb1c Klrd1 Nkg7 Klrk1 Eomes Il7r Thy1
NK cells 0.6 1.2 1.3 7.7 0.2 0.7 52.1 53.5 87.2 55.3 66.0 34.4 38.7 9.7
γδ (no V gene) 21.4 22.7 26.8 1.0 15.7 2.3 1.5 5.4 0.8 7.5 6.4 1.3 96.9 58.8
γδ (V gene confirmed) 44.9 45.0 47.1 20.9 42.9 32.0 0.9 3.4 1.6 8.7 8.3 0.4 95.6 80.3

Result: “No V gene” cells are CD3+/Trdc−/Eomes+ — they are T cells (not NK) but lack γδ TCR constant regions. Profile consistent with NKT-like cells misclassified by scType. Decision: reclassify as NKT-like.

# Reclassify "no V gene" γδ-T cells as NKT-like
# Rationale: CD3+ (T cell), Trdc− (not γδ), Eomes+ (NKT/innate-like), NK receptor+
nkt_barcodes <- colnames(gdt_all)[vgene_label == "γδ-T cells (no V gene detected)"]

no_leuk_obj_anno$vgene_label[
  rownames(no_leuk_obj_anno@meta.data) %in% nkt_barcodes] <- "NKT-like"

cat("Reclassified as NKT-like:", length(nkt_barcodes), "cells\n")
## Reclassified as NKT-like: 388 cells
table(no_leuk_obj_anno$vgene_label[
        rownames(no_leuk_obj_anno@meta.data) %in% nkt_barcodes],
      no_leuk_obj_anno$Tissue[
        rownames(no_leuk_obj_anno@meta.data) %in% nkt_barcodes])
##           
##             BM CNS
##   NKT-like 219 169

Exclusion and Relabelling

# Relabel Platelets → Stroma
stroma_idx <- no_leuk_obj_anno$scType_classification == "Platelets"
no_leuk_obj_anno$annotations[stroma_idx] <- "Stroma"
cat("Platelets relabelled as Stroma:", sum(stroma_idx), "cells\n")
## Platelets relabelled as Stroma: 1442 cells
# Exclude Erythroid-like cells only
n_before <- ncol(no_leuk_obj_anno)
no_leuk_obj_anno <- subset(
  no_leuk_obj_anno,
  subset = scType_classification !=
    "Erythroid-like and erythroid precursor cells"
)
cat("Cells before exclusion:", n_before, "\n")
## Cells before exclusion: 17990
cat("Cells after exclusion: ", ncol(no_leuk_obj_anno), "\n")
## Cells after exclusion:  17699
cat("Excluded (Erythroid):  ", n_before - ncol(no_leuk_obj_anno), "\n")
## Excluded (Erythroid):   291

Step 5 — Build Unified Annotation

γδ T cells consolidated into three groups: Vγ6Vδ4 (CNS-resident pro-tumorigenic), Vγ4 (mixed pro-/anti-tumorigenic), and Other γδ (remaining subsets). Cells with no V gene but NKT-like profile (CD3+/Trdc−/Eomes+) assigned their own “NKT-like” label. This replaces the previous five-group scheme and is directly informed by the Wiesheu & Coffelt (2024) framework.

# Baseline — scType with Stroma relabelling applied
no_leuk_obj_anno$cell_annotation <- no_leuk_obj_anno$annotations

# Overwrite with macrophage refinement
mac_idx <- !is.na(no_leuk_obj_anno$macro_refined)
no_leuk_obj_anno$cell_annotation[mac_idx] <-
  no_leuk_obj_anno$macro_refined[mac_idx]

# Overwrite with refined CD8 annotations
cd8_idx <- !is.na(no_leuk_obj_anno$cd8_annotation)
no_leuk_obj_anno$cell_annotation[cd8_idx] <-
  no_leuk_obj_anno$cd8_annotation[cd8_idx]

# Apply V gene labels to γδ T cells
vgene_idx <- !is.na(no_leuk_obj_anno$vgene_label)
no_leuk_obj_anno$cell_annotation[vgene_idx] <-
  no_leuk_obj_anno$vgene_label[vgene_idx]

# ── γδ T cell consolidation: three biologically meaningful groups ──
# Vγ6Vδ4: CNS-resident IL-17 producers (hypothesis: pro-leukaemia)
# Vγ4:    second major pro-tumorigenic subset (both IL-17+ and IFNγ+ subpops)
# Other γδ: all remaining γδ T cells

# Vγ6Vδ4
no_leuk_obj_anno$cell_annotation[
  no_leuk_obj_anno$cell_annotation %in% c("Vγ6Vδ4", "Vγ6Vδ?")] <- "Vγ6Vδ4"

# Vγ4 — all cells with Trgv4 regardless of δ chain
no_leuk_obj_anno$cell_annotation[
  no_leuk_obj_anno$cell_annotation %in% c(
    "Vγ4Vδ?", "Vγ4Vδ4", "Vγ4Vδ5")] <- "Vγ4"

# NKT-like — reclassified in Step 4f (CD3+/Trdc−/Eomes+, no γδ V gene)
no_leuk_obj_anno$cell_annotation[
  no_leuk_obj_anno$vgene_label == "NKT-like"] <- "NKT-like"

# Other γδ — everything else with a confirmed V gene
other_gdt_labels <- c(
  "Vγ?Vδ4", "Vγ?Vδ5",
  "Vγ1Vδ?", "Vγ1Vδ1", "Vγ1Vδ4",
  "Vγ2Vδ?", "Vγ2Vδ4", "Vγ2Vδ5",
  "Vγ3Vδ?",
  "Vγ5Vδ?", "Vγ5Vδ4",
  "Vγ7Vδ?", "Vγ7Vδ4",
  "γδ_like"
)

no_leuk_obj_anno$cell_annotation[
  no_leuk_obj_anno$cell_annotation %in% other_gdt_labels] <- "Other γδ"

# Standardise remaining scType CD8 labels
no_leuk_obj_anno$cell_annotation[
  no_leuk_obj_anno$cell_annotation == "Effector CD8+ T cells"] <-
  "CD8_Teff (unclassified)"

no_leuk_obj_anno$cell_annotation[
  no_leuk_obj_anno$cell_annotation == "Naive CD8+ T cells"] <- "CD8_Naive"

cat("Cells with cell_annotation:", sum(!is.na(no_leuk_obj_anno$cell_annotation)),
    "of", ncol(no_leuk_obj_anno), "\n")
## Cells with cell_annotation: 17699 of 17699
cat("\nFull distribution:\n")
## 
## Full distribution:
print(sort(table(no_leuk_obj_anno$cell_annotation), decreasing = TRUE))
## 
##                Pre-B cells                Macrophages 
##                       3098                       2031 
##                  CD8_Naive Microglia-like macrophages 
##                       1768                       1540 
##                     Stroma                Neutrophils 
##                       1442                       1145 
##      Natural killer  cells             Plasma B cells 
##                        979                        844 
##      Effector CD4+ T cells           Progenitor cells 
##                        814                        714 
##    Non-classical monocytes                  Basophils 
##                        606                        555 
##                   Other γδ                   NKT-like 
##                        505                        388 
##        CD8+ NKT-like cells                    CD8_Tex 
##                        288                        284 
##                   CD8_Tpex                     Vγ6Vδ4 
##                        143                        123 
##    Myeloid Dendritic cells              Naive B cells 
##                        117                         97 
##    CD8_Teff (unclassified)                        Vγ4 
##                         80                         72 
##                      T_eff 
##                         66
cat("\ncell_annotation × Tissue:\n")
## 
## cell_annotation × Tissue:
print(table(no_leuk_obj_anno$cell_annotation, no_leuk_obj_anno$Tissue))
##                             
##                                BM  CNS
##   Basophils                   523   32
##   CD8_Naive                  1639  129
##   CD8_Teff (unclassified)      49   31
##   CD8_Tex                     183  101
##   CD8_Tpex                    113   30
##   CD8+ NKT-like cells         173  115
##   Effector CD4+ T cells       631  183
##   Macrophages                1127  904
##   Microglia-like macrophages   64 1476
##   Myeloid Dendritic cells      76   41
##   Naive B cells                94    3
##   Natural killer  cells       887   92
##   Neutrophils                1103   42
##   NKT-like                    219  169
##   Non-classical monocytes     356  250
##   Other γδ                    253  252
##   Plasma B cells              754   90
##   Pre-B cells                2893  205
##   Progenitor cells            709    5
##   Stroma                      719  723
##   T_eff                        32   34
##   Vγ4                          40   32
##   Vγ6Vδ4                        2  121

Step 5b — Revised γδ T Cell Pro-/Anti-Tumour Scoring

New pro- and anti-tumour signatures computed directly on all γδ T cells in the base object using AddModuleScore. This replaces the barcode-transferred scores from the subset object (which covered only 357 of ~1,088 γδ T cells).

Pro-tumorigenic (IL-17A programme): Vγ4/Vγ6 biology — tissue-resident, OXPHOS-dependent, IL-1β/IL-23-responsive, PD-1+. Informed by Edwards et al. (2023), Lopes et al. (2021).

Anti-tumorigenic (IFNγ programme): Vγ1/Vγ4-CD27+/Vγ7 biology — glycolysis-dependent, CD27+, cytotoxic effectors. Informed by Reis et al. (2022), Wiesheu et al. (2024).

# ── Define signatures ──
protumor_sig <- c(
  # Effector cytokines / growth factors
  "Il17a", "Il17f", "Areg",
  # Master transcription factors
  "Rorc", "Maf",
  # Cytokine receptor responsiveness (niche signals)
  "Il1r1", "Il23r",
  # Checkpoint / surface markers
  "Pdcd1",             # PD-1 — constitutive on Vγ6
  # Tissue homing / residency
  "Ccr6", "Cxcr6",
  # γδ lineage
  "Sox13", "Blk",
  # IL-17+ γδ-associated genes
  "Tmem176a", "Tmem176b",
  # OXPHOS / lipid metabolism (Lopes et al. 2021)
  "Dgat1", "Acadl"
)

antitumor_sig <- c(
  # Effector cytokines
  "Ifng", "Tnf",
  # Subset marker
  "Cd27",
  # Cytotoxic TFs
  "Tbx21", "Eomes",
  # Cytotoxic effector machinery
  "Gzma", "Gzmb", "Prf1", "Nkg7", "Fasl",
  # Innate-like recognition
  "Klrk1",             # NKG2D
  # Terminal differentiation (Wiesheu et al. 2024 EMBO J)
  "Ly6c2",
  # Glycolysis dependence (Reis et al. 2022)
  "Slc2a1",            # GLUT1
  # Homing / cytokine responsiveness
  "Cxcr3", "Il2rb"
)

# ── Check gene availability ──
pro_found  <- protumor_sig[protumor_sig %in% rownames(no_leuk_obj_anno)]
anti_found <- antitumor_sig[antitumor_sig %in% rownames(no_leuk_obj_anno)]

cat("Pro-tumorigenic genes found:", length(pro_found), "of",
    length(protumor_sig), "\n")
## Pro-tumorigenic genes found: 16 of 16
cat(" ", paste(pro_found, collapse = ", "), "\n")
##   Il17a, Il17f, Areg, Rorc, Maf, Il1r1, Il23r, Pdcd1, Ccr6, Cxcr6, Sox13, Blk, Tmem176a, Tmem176b, Dgat1, Acadl
pro_missing <- setdiff(protumor_sig, pro_found)
if (length(pro_missing) > 0) {
  cat("  Missing:", paste(pro_missing, collapse = ", "), "\n")
}

cat("\nAnti-tumorigenic genes found:", length(anti_found), "of",
    length(antitumor_sig), "\n")
## 
## Anti-tumorigenic genes found: 15 of 15
cat(" ", paste(anti_found, collapse = ", "), "\n")
##   Ifng, Tnf, Cd27, Tbx21, Eomes, Gzma, Gzmb, Prf1, Nkg7, Fasl, Klrk1, Ly6c2, Slc2a1, Cxcr3, Il2rb
anti_missing <- setdiff(antitumor_sig, anti_found)
if (length(anti_missing) > 0) {
  cat("  Missing:", paste(anti_missing, collapse = ", "), "\n")
}

# ── Compute module scores on ALL cells ──
no_leuk_obj_anno <- AddModuleScore(no_leuk_obj_anno,
                                    features = list(pro_found),
                                    name     = "ProTumor_gdt_v2",
                                    seed     = 42)

no_leuk_obj_anno <- AddModuleScore(no_leuk_obj_anno,
                                    features = list(anti_found),
                                    name     = "AntiTumor_gdt_v2",
                                    seed     = 42)

# ── Bias score: pro minus anti ──
no_leuk_obj_anno$gdt_bias_v2 <-
  no_leuk_obj_anno$ProTumor_gdt_v21 - no_leuk_obj_anno$AntiTumor_gdt_v21

# ── Summary table ──
gdt_labels <- c("Vγ6Vδ4", "Vγ4", "Other γδ")

gdt_score_summary <- no_leuk_obj_anno@meta.data %>%
  filter(cell_annotation %in% gdt_labels) %>%
  group_by(cell_annotation, Tissue) %>%
  summarise(
    n                = n(),
    mean_pro         = round(mean(ProTumor_gdt_v21),  3),
    mean_anti        = round(mean(AntiTumor_gdt_v21), 3),
    mean_bias        = round(mean(gdt_bias_v2),       3),
    pct_pro_dominant = round(mean(gdt_bias_v2 > 0) * 100, 1),
    .groups = "drop"
  ) %>%
  arrange(cell_annotation, Tissue)

kable(gdt_score_summary,
      caption = "Revised γδ T cell functional bias — by subset and tissue")
Revised γδ T cell functional bias — by subset and tissue
cell_annotation Tissue n mean_pro mean_anti mean_bias pct_pro_dominant
Other γδ BM 253 0.289 -0.060 0.349 66.0
Other γδ CNS 252 0.429 -0.084 0.513 79.4
Vγ4 BM 40 0.951 0.046 0.905 90.0
Vγ4 CNS 32 0.934 -0.015 0.948 84.4
Vγ6Vδ4 BM 2 0.346 0.075 0.271 100.0
Vγ6Vδ4 CNS 121 1.563 -0.068 1.631 100.0
# Pro- and anti-tumour scores side by side
no_leuk_obj_anno@meta.data %>%
  filter(cell_annotation %in% gdt_labels) %>%
  pivot_longer(c(ProTumor_gdt_v21, AntiTumor_gdt_v21),
               names_to  = "score_type",
               values_to = "score") %>%
  mutate(score_type = recode(score_type,
                              "ProTumor_gdt_v21"  = "Pro-tumour (IL-17A programme)",
                              "AntiTumor_gdt_v21" = "Anti-tumour (IFNγ programme)")) %>%
  ggplot(aes(x = cell_annotation, y = score, fill = Tissue)) +
  geom_violin(trim = FALSE, alpha = 0.7, scale = "width") +
  geom_boxplot(width = 0.12, position = position_dodge(0.9),
               fill = "white", outlier.size = 0.3) +
  scale_fill_manual(values = tissue_cols) +
  facet_wrap(~score_type) +
  geom_hline(yintercept = 0, linetype = "dashed", colour = "grey40") +
  theme_classic(base_size = 11) +
  labs(x = NULL, y = "Module score",
       title = "γδ T cell functional bias — revised signatures",
       subtitle = paste0("Pro-tumour: ", length(pro_found), " genes | ",
                         "Anti-tumour: ", length(anti_found), " genes")) +
  theme(axis.text.x    = element_text(angle = 20, hjust = 1),
        legend.position = "bottom",
        strip.text      = element_text(face = "bold"))

# Net bias score (pro minus anti)
no_leuk_obj_anno@meta.data %>%
  filter(cell_annotation %in% gdt_labels) %>%
  ggplot(aes(x = cell_annotation, y = gdt_bias_v2, fill = Tissue)) +
  geom_violin(trim = FALSE, alpha = 0.7, scale = "width") +
  geom_boxplot(width = 0.12, position = position_dodge(0.9),
               fill = "white", outlier.size = 0.3) +
  scale_fill_manual(values = tissue_cols) +
  geom_hline(yintercept = 0, linetype = "dashed", colour = "grey40") +
  theme_classic(base_size = 11) +
  labs(x = NULL, y = "Bias score (pro − anti)",
       title = "Net pro-tumorigenic bias — γδ T cell subtypes",
       subtitle = "Positive = pro-tumour dominant | Negative = anti-tumour dominant") +
  theme(axis.text.x    = element_text(angle = 20, hjust = 1),
        legend.position = "bottom")

# Per-gene dot plots
gdt_obj <- subset(no_leuk_obj_anno,
                   subset = cell_annotation %in% gdt_labels)
Idents(gdt_obj) <- "cell_annotation"

DotPlot(gdt_obj,
        features  = pro_found,
        cols      = c("lightgrey", "#e74c3c"),
        split.by  = "Tissue",
        dot.scale = 6) +
  RotatedAxis() +
  ggtitle("Pro-tumorigenic signature genes — γδ subtypes × tissue") +
  theme(axis.text.x = element_text(size = 8))
## Warning: Removed 6 rows containing missing values or values outside the scale range
## (`geom_point()`).

DotPlot(gdt_obj,
        features  = anti_found,
        cols      = c("lightgrey", "#2980b9"),
        split.by  = "Tissue",
        dot.scale = 6) +
  RotatedAxis() +
  ggtitle("Anti-tumorigenic signature genes — γδ subtypes × tissue") +
  theme(axis.text.x = element_text(size = 8))

# ── Functional classification ──
no_leuk_obj_anno$gdt_functional_class <- case_when(
  !(no_leuk_obj_anno$cell_annotation %in% gdt_labels) ~ NA_character_,
  no_leuk_obj_anno$gdt_bias_v2 >  0.1  ~ "Pro-tumour biased",
  no_leuk_obj_anno$gdt_bias_v2 < -0.1  ~ "Anti-tumour biased",
  TRUE                                  ~ "Intermediate"
)

no_leuk_obj_anno@meta.data %>%
  filter(cell_annotation %in% gdt_labels) %>%
  group_by(cell_annotation, Tissue, gdt_functional_class) %>%
  summarise(n = n(), .groups = "drop") %>%
  pivot_wider(names_from  = gdt_functional_class,
              values_from = n,
              values_fill = 0) %>%
  arrange(cell_annotation, Tissue) %>%
  kable(caption = "γδ T cell functional classification by subset and tissue")
γδ T cell functional classification by subset and tissue
cell_annotation Tissue Anti-tumour biased Intermediate Pro-tumour biased
Other γδ BM 69 30 154
Other γδ CNS 40 26 186
Vγ4 BM 3 2 35
Vγ4 CNS 4 1 27
Vγ6Vδ4 BM 0 0 2
Vγ6Vδ4 CNS 0 0 121
# ── Wilcoxon tests: BM vs CNS bias within each γδ subset ──
cat("── Wilcoxon tests: BM vs CNS bias within each γδ subset ──\n")
## ── Wilcoxon tests: BM vs CNS bias within each γδ subset ──
for (lbl in gdt_labels) {
  sub_df <- no_leuk_obj_anno@meta.data %>%
    filter(cell_annotation == lbl)

  bm_vals  <- sub_df$gdt_bias_v2[sub_df$Tissue == "BM"]
  cns_vals <- sub_df$gdt_bias_v2[sub_df$Tissue == "CNS"]

  if (length(bm_vals) >= 3 & length(cns_vals) >= 3) {
    wt <- wilcox.test(bm_vals, cns_vals)
    cat(sprintf("\n%s: BM (n=%d, median=%.3f) vs CNS (n=%d, median=%.3f) — p=%.4g\n",
                lbl,
                length(bm_vals),  median(bm_vals),
                length(cns_vals), median(cns_vals),
                wt$p.value))
  } else {
    cat(sprintf("\n%s: insufficient cells for BM vs CNS comparison\n", lbl))
  }
}
## 
## Vγ6Vδ4: insufficient cells for BM vs CNS comparison
## 
## Vγ4: BM (n=40, median=1.061) vs CNS (n=32, median=1.238) — p=0.4534
## 
## Other γδ: BM (n=253, median=0.339) vs CNS (n=252, median=0.459) — p=0.004268
# ── Compare old vs new scores (for cells that have both) ──
both_scored <- no_leuk_obj_anno@meta.data %>%
  filter(cell_annotation %in% gdt_labels,
         !is.na(ProTumorScore1))

if (nrow(both_scored) > 10) {
  cat("\n── Correlation: old vs new scores (",
      nrow(both_scored), " cells with both) ──\n")

  cor_pro  <- cor(both_scored$ProTumorScore1,
                  both_scored$ProTumor_gdt_v21,
                  method = "spearman")
  cor_anti <- cor(both_scored$AntiTumorScore1,
                  both_scored$AntiTumor_gdt_v21,
                  method = "spearman")

  cat(sprintf("  Pro-tumour:  Spearman rho = %.3f\n", cor_pro))
  cat(sprintf("  Anti-tumour: Spearman rho = %.3f\n", cor_anti))

  p1 <- ggplot(both_scored,
               aes(x = ProTumorScore1, y = ProTumor_gdt_v21,
                   colour = Tissue)) +
    geom_point(alpha = 0.6, size = 1) +
    scale_colour_manual(values = tissue_cols) +
    geom_smooth(method = "lm", se = FALSE, colour = "black",
                linetype = "dashed", linewidth = 0.5) +
    theme_classic(base_size = 10) +
    labs(x = "Old pro-tumour score", y = "New pro-tumour score",
         title = sprintf("Pro-tumour (ρ=%.2f)", cor_pro))

  p2 <- ggplot(both_scored,
               aes(x = AntiTumorScore1, y = AntiTumor_gdt_v21,
                   colour = Tissue)) +
    geom_point(alpha = 0.6, size = 1) +
    scale_colour_manual(values = tissue_cols) +
    geom_smooth(method = "lm", se = FALSE, colour = "black",
                linetype = "dashed", linewidth = 0.5) +
    theme_classic(base_size = 10) +
    labs(x = "Old anti-tumour score", y = "New anti-tumour score",
         title = sprintf("Anti-tumour (ρ=%.2f)", cor_anti))

  print(p1 + p2 +
    plot_annotation(title = "Old vs revised bias scores — barcode-matched cells"))
} else {
  cat("Fewer than 10 cells with old scores — skipping comparison\n")
}
## 
## ── Correlation: old vs new scores ( 286  cells with both) ──
##   Pro-tumour:  Spearman rho = 0.739
##   Anti-tumour: Spearman rho = 0.494
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'

Step 6 — Standard PCA, UMAP and Clustering

IMPORTANT: This step caches the entire Seurat object including metadata. If upstream annotations change (e.g. NKT-like reclassification), delete the cached .qs file to force recomputation. Otherwise the cached object will overwrite all annotation work from Steps 1–5b.

standard_path <- "/exports/eddie/scratch/aduguid3/harmony_clustering/11_immune_standard_v3.qs"

if (!file.exists(standard_path)) {

  # Wipe any existing reductions from the 05_ object
  no_leuk_obj_anno@reductions <- list()

  no_leuk_obj_anno <- NormalizeData(no_leuk_obj_anno, verbose = FALSE)
  no_leuk_obj_anno <- FindVariableFeatures(no_leuk_obj_anno,
                                            selection.method = "vst",
                                            nfeatures        = 3000,
                                            verbose          = FALSE)
  no_leuk_obj_anno <- ScaleData(no_leuk_obj_anno, verbose = FALSE)
  no_leuk_obj_anno <- RunPCA(no_leuk_obj_anno, verbose = FALSE)

  ElbowPlot(no_leuk_obj_anno, ndims = 50) +
    ggtitle("PCA elbow plot — immune cells (standard)")

  no_leuk_obj_anno <- RunUMAP(no_leuk_obj_anno,
                               reduction = "pca",
                               dims      = 1:35)

  no_leuk_obj_anno <- FindNeighbors(no_leuk_obj_anno,
                                     reduction = "pca",
                                     dims      = 1:35)

  no_leuk_obj_anno <- FindClusters(no_leuk_obj_anno,
                                    resolution = 0.3)

  qs_save(no_leuk_obj_anno, standard_path)
  cat("Standard integration complete — saved\n")

} else {
  no_leuk_obj_anno <- qs_read(standard_path)
  cat("Loaded existing standard object\n")
}
## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session
## 13:08:18 UMAP embedding parameters a = 0.9922 b = 1.112
## 13:08:18 Read 17699 rows and found 35 numeric columns
## 13:08:18 Using Annoy for neighbor search, n_neighbors = 30
## 13:08:18 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 13:08:21 Writing NN index file to temp file /tmp/RtmpcIMkc9/file180f64281922be
## 13:08:21 Searching Annoy index using 1 thread, search_k = 3000
## 13:08:26 Annoy recall = 100%
## 13:08:26 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 13:08:28 Initializing from normalized Laplacian + noise (using RSpectra)
## 13:08:32 Commencing optimization for 200 epochs, with 761782 positive edges
## 13:08:32 Using rng type: pcg
## 13:08:40 Optimization finished
## Computing nearest neighbor graph
## Computing SNN
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 17699
## Number of edges: 692517
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9703
## Number of communities: 23
## Elapsed time: 1 seconds
## Standard integration complete — saved
cat("Cells:", ncol(no_leuk_obj_anno), "\n")
## Cells: 17699

Step 7 — Assess Mouse_ID Effects

p_tissue <- DimPlot(no_leuk_obj_anno,
                     group.by = "Tissue",
                     cols     = tissue_cols,
                     alpha    = 0.5) +
  ggtitle("by tissue")

p_mouse <- DimPlot(no_leuk_obj_anno,
                    group.by = "Mouse_ID",
                    cols     = mouse_cols,
                    alpha    = 0.5) +
  ggtitle("by Mouse_ID")

p_tissue + p_mouse

n_types   <- length(unique(no_leuk_obj_anno$cell_annotation))
anno_cols <- setNames(
  colorRampPalette(c("#1a3a6b","#e67e22","#27ae60","#8b0000",
                     "#9b59b6","#2980b9","#e74c3c","#1abc9c",
                     "#f39c12","#7f8c8d","#d35400","#2ecc71",
                     "#8e44ad","#16a085","#c0392b"))(n_types),
  sort(unique(no_leuk_obj_anno$cell_annotation))
)

p <- DimPlot(no_leuk_obj_anno,
              group.by = "cell_annotation",
              cols     = anno_cols,
              label    = FALSE,
              pt.size  = 0.3) +
  ggtitle("Immune cells — standard UMAP") +
  theme(legend.position = "right",
        legend.text      = element_text(size = 7)) +
  guides(colour = guide_legend(override.aes = list(size = 3)))

LabelClusters(p,
               id           = "cell_annotation",
               fontface     = "bold",
               size         = 3,
               repel        = TRUE,
               box          = FALSE,
               force        = 15,
               max.overlaps = 30)

Step 8 — Optional Harmony Correction by Mouse_ID

# Set apply_harmony <- TRUE if Mouse_ID effects are clearly visible in Step 7
apply_harmony <- FALSE  # UPDATE AFTER INSPECTING STEP 7 UMAP

harmony_path <- "/exports/eddie/scratch/aduguid3/harmony_clustering/11_immune_harmony_v3.qs"

if (apply_harmony) {

  if (!file.exists(harmony_path)) {

    no_leuk_obj_anno@reductions[["umap"]]    <- NULL
    no_leuk_obj_anno@reductions[["harmony"]] <- NULL

    no_leuk_obj_anno <- harmony::RunHarmony(no_leuk_obj_anno,
                                             group.by.vars    = "Mouse_ID",
                                             reduction        = "pca",
                                             reduction.save   = "harmony",
                                             max.iter.harmony = 20,
                                             theta            = 2,
                                             verbose          = TRUE)

    no_leuk_obj_anno <- RunUMAP(no_leuk_obj_anno,
                                 reduction = "harmony",
                                 dims      = 1:35)

    no_leuk_obj_anno <- FindNeighbors(no_leuk_obj_anno,
                                       reduction = "harmony",
                                       dims      = 1:35)

    no_leuk_obj_anno <- FindClusters(no_leuk_obj_anno,
                                      resolution = 0.3)

    qs_save(no_leuk_obj_anno, harmony_path)
    cat("Harmony (Mouse_ID) integration complete — saved\n")

  } else {
    no_leuk_obj_anno <- qs_read(harmony_path)
    cat("Loaded existing Harmony object\n")
  }

  p_mouse_post <- DimPlot(no_leuk_obj_anno,
                           group.by = "Mouse_ID",
                           cols     = mouse_cols,
                           alpha    = 0.5) +
    ggtitle("by Mouse_ID — post Harmony")

  p_tissue_post <- DimPlot(no_leuk_obj_anno,
                            group.by = "Tissue",
                            cols     = tissue_cols,
                            alpha    = 0.5) +
    ggtitle("by tissue — post Harmony")

  p_tissue_post + p_mouse_post

} else {
  cat("Harmony correction not applied — using standard PCA/UMAP\n")
  cat("Set apply_harmony <- TRUE in this chunk if Mouse_ID effects\n")
  cat("are clearly visible in the Step 7 UMAP\n")
}
## Harmony correction not applied — using standard PCA/UMAP
## Set apply_harmony <- TRUE in this chunk if Mouse_ID effects
## are clearly visible in the Step 7 UMAP

Step 9 — Cell Type Composition by Tissue

comp <- no_leuk_obj_anno@meta.data %>%
  group_by(Tissue, cell_annotation) %>%
  summarise(n = n(), .groups = "drop") %>%
  group_by(Tissue) %>%
  mutate(prop = n / sum(n))

ggplot(comp, aes(x = Tissue, y = prop, fill = cell_annotation)) +
  geom_bar(stat = "identity") +
  scale_fill_manual(values = anno_cols) +
  scale_y_continuous(labels = scales::percent) +
  theme_classic(base_size = 12) +
  labs(x = NULL, y = "Proportion", fill = NULL,
       title = "Immune cell composition by tissue") +
  theme(legend.position = "right",
        legend.text      = element_text(size = 8))

comp %>%
  mutate(prop = round(prop, 3)) %>%
  pivot_wider(names_from  = Tissue,
              values_from = c(n, prop),
              values_fill = 0) %>%
  arrange(desc(n_CNS)) %>%
  kable(caption = "Immune cell composition by tissue — counts and proportions")
Immune cell composition by tissue — counts and proportions
cell_annotation n_BM n_CNS prop_BM prop_CNS
Microglia-like macrophages 64 1476 0.005 0.292
Macrophages 1127 904 0.089 0.179
Stroma 719 723 0.057 0.143
Other γδ 253 252 0.020 0.050
Non-classical monocytes 356 250 0.028 0.049
Pre-B cells 2893 205 0.229 0.041
Effector CD4+ T cells 631 183 0.050 0.036
NKT-like 219 169 0.017 0.033
CD8_Naive 1639 129 0.130 0.025
Vγ6Vδ4 2 121 0.000 0.024
CD8+ NKT-like cells 173 115 0.014 0.023
CD8_Tex 183 101 0.014 0.020
Natural killer cells 887 92 0.070 0.018
Plasma B cells 754 90 0.060 0.018
Neutrophils 1103 42 0.087 0.008
Myeloid Dendritic cells 76 41 0.006 0.008
T_eff 32 34 0.003 0.007
Basophils 523 32 0.041 0.006
Vγ4 40 32 0.003 0.006
CD8_Teff (unclassified) 49 31 0.004 0.006
CD8_Tpex 113 30 0.009 0.006
Progenitor cells 709 5 0.056 0.001
Naive B cells 94 3 0.007 0.001

Step 10 — γδ T Cell Annotations on UMAP

gdt_cols <- c(
  "Vγ6Vδ4"   = "#e74c3c",
  "Vγ4"      = "#e67e22",
  "Other γδ" = "#bdc3c7",
  "Other"    = "grey90"
)

no_leuk_obj_anno$gdt_highlight <- ifelse(
  no_leuk_obj_anno$cell_annotation %in% gdt_labels,
  no_leuk_obj_anno$cell_annotation,
  "Other"
)

DimPlot(no_leuk_obj_anno,
         group.by = "gdt_highlight",
         cols     = gdt_cols,
         order    = gdt_labels,
         pt.size  = 0.5) +
  ggtitle("γδ T cells — V gene populations") +
  theme(legend.position = "bottom",
        legend.text      = element_text(size = 8))

no_leuk_obj_anno$vg6vd4_highlight <- ifelse(
  no_leuk_obj_anno$cell_annotation == "Vγ6Vδ4",
  "Vγ6Vδ4", "Other"
)

DimPlot(no_leuk_obj_anno,
         group.by = "vg6vd4_highlight",
         cols     = c("Vγ6Vδ4" = "#e74c3c", "Other" = "grey90"),
         order    = "Vγ6Vδ4",
         split.by = "Tissue",
         pt.size  = 0.8) +
  ggtitle("Vγ6Vδ4 T cells — BM vs CNS") +
  theme(legend.position = "bottom")

no_leuk_obj_anno@meta.data %>%
  filter(cell_annotation %in% gdt_labels) %>%
  group_by(cell_annotation, Tissue) %>%
  summarise(n = n(), .groups = "drop") %>%
  pivot_wider(names_from = Tissue, values_from = n, values_fill = 0) %>%
  mutate(total   = BM + CNS,
         pct_CNS = round(CNS / total * 100, 1)) %>%
  arrange(desc(pct_CNS)) %>%
  kable(caption = "γδ T cell populations by tissue")
γδ T cell populations by tissue
cell_annotation BM CNS total pct_CNS
Vγ6Vδ4 2 121 123 98.4
Other γδ 253 252 505 49.9
Vγ4 40 32 72 44.4
# UMAP coloured by bias score
gdt_barcodes <- rownames(no_leuk_obj_anno@meta.data)[
  no_leuk_obj_anno$cell_annotation %in% gdt_labels]

gdt_for_plot <- subset(no_leuk_obj_anno, cells = gdt_barcodes)

FeaturePlot(gdt_for_plot,
            features = "gdt_bias_v2",
            cols     = c("#2980b9", "grey90", "#e74c3c"),
            pt.size  = 1.5) +
  ggtitle("γδ T cells — pro-tumour (red) vs anti-tumour (blue) bias") +
  labs(colour = "Bias\n(pro−anti)")

FeaturePlot(gdt_for_plot,
            features = "gdt_bias_v2",
            split.by = "Tissue",
            cols     = c("#2980b9", "grey90", "#e74c3c"),
            pt.size  = 1.5) +
  ggtitle("γδ T cell bias — BM vs CNS")

Step 11 — CD8 T Cell Annotations on UMAP

cd8_labels <- c("CD8_Tex","CD8_Tpex","T_eff",
                 "CD8_Teff (unclassified)","CD8_Naive")

cd8_cols <- c(
  "CD8_Tex"                 = "#8b0000",
  "CD8_Tpex"               = "#e67e22",
  "T_eff"                  = "#27ae60",
  "CD8_Teff (unclassified)"= "#f39c12",
  "CD8_Naive"              = "#2980b9",
  "Other"                  = "grey90"
)

no_leuk_obj_anno$cd8_highlight <- ifelse(
  no_leuk_obj_anno$cell_annotation %in% cd8_labels,
  no_leuk_obj_anno$cell_annotation,
  "Other"
)

DimPlot(no_leuk_obj_anno,
         group.by = "cd8_highlight",
         cols     = cd8_cols,
         order    = cd8_labels,
         pt.size  = 0.5) +
  ggtitle("CD8 T cell subtypes") +
  theme(legend.position = "bottom",
        legend.text      = element_text(size = 8))

no_leuk_obj_anno@meta.data %>%
  filter(cell_annotation %in% cd8_labels) %>%
  group_by(cell_annotation, Tissue) %>%
  summarise(n = n(), .groups = "drop") %>%
  pivot_wider(names_from = Tissue, values_from = n, values_fill = 0) %>%
  mutate(total   = BM + CNS,
         pct_CNS = round(CNS / total * 100, 1)) %>%
  arrange(desc(pct_CNS)) %>%
  kable(caption = "CD8 T cell subtypes by tissue")
CD8 T cell subtypes by tissue
cell_annotation BM CNS total pct_CNS
T_eff 32 34 66 51.5
CD8_Teff (unclassified) 49 31 80 38.8
CD8_Tex 183 101 284 35.6
CD8_Tpex 113 30 143 21.0
CD8_Naive 1639 129 1768 7.3

Step 12 — IL-17 Ligand and Receptor Expression

Leukaemia cells express the non-canonical IL-17 receptor heterodimer Il17ra/Il17rb at ~47% but produce no IL-17 ligands themselves (script 06). Here all IL-17 ligands and receptors — including IL-17RE (receptor for IL-17D) and IL-17RD (co-receptor for IL-17A) — are assessed across immune cell types to identify candidate ligand sources.

il17_ligands   <- c("Il17a", "Il17b", "Il17c", "Il17d", "Il17f", "Il25")
il17_receptors <- c("Il17ra", "Il17rb", "Il17rc", "Il17rd", "Il17re")

il17_detected <- c(il17_ligands, il17_receptors)[
  c(il17_ligands, il17_receptors) %in% rownames(no_leuk_obj_anno)]

cat("IL-17 genes detected:", paste(il17_detected, collapse = ", "), "\n")
## IL-17 genes detected: Il17a, Il17b, Il17c, Il17d, Il17f, Il25, Il17ra, Il17rb, Il17rc, Il17rd, Il17re
Idents(no_leuk_obj_anno) <- "cell_annotation"

DotPlot(no_leuk_obj_anno,
         features  = il17_detected,
         cols      = c("lightgrey", "firebrick"),
         dot.scale = 7) +
  RotatedAxis() +
  ggtitle("IL-17 ligands and receptors — immune cell types") +
  theme(axis.text.x = element_text(size = 9),
        axis.text.y = element_text(size = 8))

gdt_obj <- subset(no_leuk_obj_anno,
                   subset = cell_annotation %in% gdt_labels)
Idents(gdt_obj) <- "cell_annotation"

DotPlot(gdt_obj,
         features  = il17_detected,
         cols      = c("lightgrey", "#e74c3c"),
         dot.scale = 8) +
  RotatedAxis() +
  ggtitle("IL-17 expression — γδ T cell populations") +
  theme(axis.text.x = element_text(size = 9))
## Warning: Scaling data with a low number of groups may produce misleading
## results

il17l_detected <- il17_ligands[il17_ligands %in% rownames(no_leuk_obj_anno)]

DotPlot(no_leuk_obj_anno,
         features  = il17l_detected,
         cols      = c("lightgrey", "firebrick"),
         split.by  = "Tissue",
         dot.scale = 5) +
  RotatedAxis() +
  ggtitle("IL-17 ligands — BM vs CNS across immune cell types") +
  theme(axis.text.x = element_text(size = 8),
        axis.text.y = element_text(size = 7))

Step 12b — Stromal IL-17D / IL-17RE Axis

IL-17D is the most divergent IL-17 family member, expressed by non-immune cells including stroma. IL-17RE is its receptor and can also heterodimerise with IL-17RA. Stromal IL-17D may create a feed-forward loop with Vγ6Vδ4-derived IL-17A signalling.

stroma_il17 <- subset(no_leuk_obj_anno,
                       subset = cell_annotation == "Stroma")

il17d_axis <- c("Il17d", "Il17re", "Il17ra", "Il17rd")
il17d_found <- il17d_axis[il17d_axis %in% rownames(stroma_il17)]

if (length(il17d_found) > 0) {
  VlnPlot(stroma_il17,
          features  = il17d_found,
          split.by  = "Tissue",
          cols      = tissue_cols,
          pt.size   = 0) +
    plot_annotation(title = "IL-17D axis — Stromal cells by tissue")
}
## The default behaviour of split.by has changed.
## Separate violin plots are now plotted side-by-side.
## To restore the old behaviour of a single split violin,
## set split.plot = TRUE.
##       
## This message will be shown once per session.

Step 13 — IL-17 Receptor Expression on Niche Populations

Do niche-resident cells express IL-17 receptors? This determines whether Vγ6Vδ4-derived IL-17A has direct effects on niche infrastructure.

niche_types <- c("Stroma", "Macrophages", "Microglia-like macrophages",
                  "Non-classical monocytes", "Myeloid Dendritic cells",
                  "Neutrophils", "Pre-B cells")

niche_obj <- subset(no_leuk_obj_anno,
                     subset = cell_annotation %in% niche_types)
Idents(niche_obj) <- "cell_annotation"

il17r_genes <- c("Il17ra", "Il17rb", "Il17rc", "Il17rd", "Il17re")
il17r_found <- il17r_genes[il17r_genes %in% rownames(niche_obj)]

DotPlot(niche_obj,
        features  = il17r_found,
        cols      = c("lightgrey", "firebrick"),
        split.by  = "Tissue",
        dot.scale = 6) +
  RotatedAxis() +
  ggtitle("IL-17 receptors on niche populations — BM vs CNS")

Step 14 — Myeloid Immunosuppressive Scoring

If Vγ6Vδ4-derived IL-17A drives immunosuppression, the myeloid compartment should show pro-tumorigenic polarisation in CNS. IL-17A → G-CSF → neutrophil/MDSC expansion is the established axis (Edwards et al. 2023).

immunosupp_genes <- c("Arg1", "Nos2", "Mrc1", "Cd274",
                       "Csf3", "Tgfb1", "Il10",
                       "Vegfa", "Ccl2", "Ccl22")

immunosupp_found <- immunosupp_genes[immunosupp_genes %in%
                                       rownames(no_leuk_obj_anno)]

myeloid_types <- c("Macrophages", "Microglia-like macrophages",
                    "Non-classical monocytes", "Neutrophils",
                    "Myeloid Dendritic cells")

myeloid_obj <- subset(no_leuk_obj_anno,
                       subset = cell_annotation %in% myeloid_types)
Idents(myeloid_obj) <- "cell_annotation"

DotPlot(myeloid_obj,
        features  = immunosupp_found,
        cols      = c("lightgrey", "darkred"),
        split.by  = "Tissue",
        dot.scale = 6) +
  RotatedAxis() +
  ggtitle("Immunosuppressive markers on myeloid cells — BM vs CNS")
## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_point()`).

no_leuk_obj_anno <- AddModuleScore(no_leuk_obj_anno,
                                    features = list(immunosupp_found),
                                    name     = "ImmunoSupp_sig",
                                    seed     = 42)

no_leuk_obj_anno@meta.data %>%
  filter(cell_annotation %in% myeloid_types) %>%
  ggplot(aes(x = cell_annotation, y = ImmunoSupp_sig1,
             fill = Tissue)) +
  geom_violin(trim = FALSE, alpha = 0.7, scale = "width") +
  geom_boxplot(width = 0.15, position = position_dodge(0.9),
               fill = "white", outlier.size = 0.3) +
  scale_fill_manual(values = tissue_cols) +
  theme_classic(base_size = 11) +
  labs(y = "Immunosuppressive module score",
       title = "Myeloid immunosuppressive polarisation — BM vs CNS") +
  theme(axis.text.x = element_text(angle = 30, hjust = 1))

Step 15 — CD8 T Cell Exhaustion Scoring

The hypothesis predicts CD8 dysfunction in the CNS niche.

exhaust_genes <- c("Pdcd1", "Lag3", "Havcr2", "Tox", "Tox2",
                    "Entpd1", "Tigit", "Ctla4", "Cd244a",
                    "Eomes", "Tbx21")

exhaust_found <- exhaust_genes[exhaust_genes %in%
                                 rownames(no_leuk_obj_anno)]

no_leuk_obj_anno <- AddModuleScore(no_leuk_obj_anno,
                                    features = list(exhaust_found),
                                    name     = "CD8_exhaust",
                                    seed     = 42)

no_leuk_obj_anno@meta.data %>%
  filter(cell_annotation %in% cd8_labels) %>%
  ggplot(aes(x = cell_annotation, y = CD8_exhaust1,
             fill = Tissue)) +
  geom_violin(trim = FALSE, alpha = 0.7, scale = "width") +
  geom_boxplot(width = 0.15, position = position_dodge(0.9),
               fill = "white", outlier.size = 0.3) +
  scale_fill_manual(values = tissue_cols) +
  theme_classic(base_size = 11) +
  labs(y = "Exhaustion module score",
       title = "CD8 T cell exhaustion — BM vs CNS") +
  theme(axis.text.x = element_text(angle = 30, hjust = 1))

cd8_obj <- subset(no_leuk_obj_anno,
                   subset = cell_annotation %in% cd8_labels)
Idents(cd8_obj) <- "cell_annotation"

DotPlot(cd8_obj,
        features  = exhaust_found,
        cols      = c("lightgrey", "#8b0000"),
        split.by  = "Tissue",
        dot.scale = 6) +
  RotatedAxis() +
  ggtitle("CD8 exhaustion markers — BM vs CNS")

Step 16 — AREG / EGFR Axis

Wiesheu & Coffelt (2024) highlight that Vγ4/Vγ6 can co-express IL-17A and AREG (amphiregulin), an EGFR ligand that directly promotes cancer cell proliferation.

areg_genes <- c("Areg", "Ereg", "Hbegf", "Egfr")
areg_found <- areg_genes[areg_genes %in% rownames(no_leuk_obj_anno)]
cat("AREG/EGFR genes found:", paste(areg_found, collapse = ", "), "\n")
## AREG/EGFR genes found: Areg, Hbegf, Egfr
if (length(areg_found) > 0) {
  gdt_obj <- subset(no_leuk_obj_anno,
                     subset = cell_annotation %in% gdt_labels)
  Idents(gdt_obj) <- "cell_annotation"

  DotPlot(gdt_obj,
          features  = areg_found,
          cols      = c("lightgrey", "#e74c3c"),
          split.by  = "Tissue",
          dot.scale = 8) +
    RotatedAxis() +
    ggtitle("AREG/EGFR ligands — γδ T cell populations")

  Idents(no_leuk_obj_anno) <- "cell_annotation"
  DotPlot(no_leuk_obj_anno,
          features  = areg_found,
          cols      = c("lightgrey", "firebrick"),
          dot.scale = 6) +
    RotatedAxis() +
    ggtitle("AREG/EGFR axis — all immune cell types")
}

Step 17 — CNS Niche Support Ligands

The Posner et al. (2022) review shows that the dural niche supports developing B cells via CXCL12 (from fibroblast-like cells) and IL-7 (from endothelium). B-ALL cells are malignant developing B cells — the CNS niche may actively support leukaemia survival through the same signals.

niche_ligands <- c("Cxcl12", "Il7", "Kitl", "Flt3l", "Vcam1",
                    "Cxcl13", "Ccl19", "Ccl21a",
                    "Igf1", "Wnt5a", "Rspo3")

niche_found <- niche_ligands[niche_ligands %in%
                               rownames(no_leuk_obj_anno)]

stroma_macro <- subset(no_leuk_obj_anno,
                        subset = cell_annotation %in% c(
                          "Stroma", "Macrophages",
                          "Microglia-like macrophages"))
Idents(stroma_macro) <- "cell_annotation"

DotPlot(stroma_macro,
        features  = niche_found,
        cols      = c("lightgrey", "steelblue"),
        split.by  = "Tissue",
        dot.scale = 6) +
  RotatedAxis() +
  ggtitle("B cell niche support ligands — Stroma & Macrophages")

niche_receptors <- c("Cxcr4", "Il7r", "Kit", "Flt3",
                      "Itga4", "Cxcr5", "Ccr7")
niche_r_found <- niche_receptors[niche_receptors %in%
                                   rownames(no_leuk_obj_anno)]

preb_obj <- subset(no_leuk_obj_anno,
                    subset = cell_annotation == "Pre-B cells")
Idents(preb_obj) <- "Tissue"

DotPlot(preb_obj,
        features  = niche_r_found,
        cols      = c("lightgrey", "steelblue"),
        dot.scale = 8) +
  RotatedAxis() +
  ggtitle("Niche receptors on Pre-B cells — BM vs CNS")
## Warning: Scaling data with a low number of groups may produce misleading
## results

Step 18 — Checkpoint Ligand / Receptor Landscape

Vγ6 cells constitutively express PD-1, and anti-PD-1 paradoxically expands them (Edwards et al. 2023). This has therapeutic implications — anti-PD-1 in B-ALL could expand pro-tumorigenic Vγ6Vδ4.

checkpoint_genes <- c(
  # Ligands
  "Cd274", "Pdcd1lg2", "Cd80", "Cd86",
  "Lgals9", "Pvr", "Nectin2",
  # Receptors
  "Pdcd1", "Ctla4", "Havcr2", "Tigit",
  "Lag3", "Cd28", "Cd226"
)

ckpt_found <- checkpoint_genes[checkpoint_genes %in%
                                 rownames(no_leuk_obj_anno)]

ckpt_types <- c("Vγ6Vδ4", "Vγ4", "Other γδ",
                 "CD8_Tex", "CD8_Tpex", "T_eff",
                 "Macrophages", "Microglia-like macrophages",
                 "Neutrophils")

ckpt_obj <- subset(no_leuk_obj_anno,
                    subset = cell_annotation %in% ckpt_types)
Idents(ckpt_obj) <- "cell_annotation"

DotPlot(ckpt_obj,
        features  = ckpt_found,
        cols      = c("lightgrey", "navy"),
        dot.scale = 5) +
  RotatedAxis() +
  ggtitle("Checkpoint ligands & receptors — key populations") +
  theme(axis.text.x = element_text(size = 8),
        axis.text.y = element_text(size = 8))

Step 19 — Meningeal Immunity Markers

The Posner et al. (2022) review describes IgA+ plasma cells at CNS borders, gut-derived and microbiome-dependent. Checking Ig isotype and gut-homing markers on the Plasma B cells.

ig_isotype_genes <- c("Igha", "Ighg1", "Ighg2b", "Ighg2c",
                       "Ighm", "Ighd", "Ighe", "Jchain")

ig_found <- ig_isotype_genes[ig_isotype_genes %in%
                               rownames(no_leuk_obj_anno)]

plasma_obj <- subset(no_leuk_obj_anno,
                      subset = cell_annotation == "Plasma B cells")

if (ncol(plasma_obj) > 10) {
  Idents(plasma_obj) <- "Tissue"

  DotPlot(plasma_obj,
          features  = ig_found,
          cols      = c("lightgrey", "purple"),
          dot.scale = 8) +
    RotatedAxis() +
    ggtitle("Ig isotype expression — Plasma B cells BM vs CNS")

  homing_genes <- c("Itga4", "Ccr9", "Ccr6", "Ccr10",
                     "Sdc1", "Cd38", "Xbp1", "Prdm1")
  homing_found <- homing_genes[homing_genes %in% rownames(plasma_obj)]

  DotPlot(plasma_obj,
          features  = homing_found,
          cols      = c("lightgrey", "purple"),
          dot.scale = 8) +
    RotatedAxis() +
    ggtitle("Tissue homing & plasma markers — BM vs CNS")
} else {
  cat("Insufficient Plasma B cells for analysis\n")
}
## Warning: Scaling data with a low number of groups may produce misleading
## results
## Warning: Scaling data with a low number of groups may produce misleading
## results

Step 20 — Save Object

output_path <- "/exports/eddie/scratch/aduguid3/harmony_clustering/11_immune_annotated_v3.qs"

qs_save(no_leuk_obj_anno, output_path)
cat("Saved:", output_path, "\n")
## Saved: /exports/eddie/scratch/aduguid3/harmony_clustering/11_immune_annotated_v3.qs
expected_cols <- c("Tissue", "Mouse_ID",
                   "cell_annotation", "annotations",
                   "scType_classification",
                   "vgene_label", "gdt_qc_group",
                   "gdt_annotation", "gdt_bias", "gdt_class",
                   "AntiTumorScore1", "ProTumorScore1",
                   "ProTumor_gdt_v21", "AntiTumor_gdt_v21",
                   "gdt_bias_v2", "gdt_functional_class",
                   "cd8_annotation",
                   "macro_refined", "Microglia_sig1", "BMMacro_sig1",
                   "ImmunoSupp_sig1", "CD8_exhaust1",
                   "gdt_highlight", "vg6vd4_highlight", "cd8_highlight")

cat("\nKey metadata columns:\n")
## 
## Key metadata columns:
for (col in expected_cols) {
  cat(" ", col, ":", col %in% colnames(no_leuk_obj_anno@meta.data), "\n")
}
##   Tissue : TRUE 
##   Mouse_ID : TRUE 
##   cell_annotation : TRUE 
##   annotations : TRUE 
##   scType_classification : TRUE 
##   vgene_label : TRUE 
##   gdt_qc_group : TRUE 
##   gdt_annotation : TRUE 
##   gdt_bias : TRUE 
##   gdt_class : TRUE 
##   AntiTumorScore1 : TRUE 
##   ProTumorScore1 : TRUE 
##   ProTumor_gdt_v21 : TRUE 
##   AntiTumor_gdt_v21 : TRUE 
##   gdt_bias_v2 : TRUE 
##   gdt_functional_class : TRUE 
##   cd8_annotation : TRUE 
##   macro_refined : TRUE 
##   Microglia_sig1 : TRUE 
##   BMMacro_sig1 : TRUE 
##   ImmunoSupp_sig1 : TRUE 
##   CD8_exhaust1 : TRUE 
##   gdt_highlight : TRUE 
##   vg6vd4_highlight : TRUE 
##   cd8_highlight : TRUE
cat("\nFinal cell count:", ncol(no_leuk_obj_anno), "\n")
## 
## Final cell count: 17699
print(table(no_leuk_obj_anno$Tissue))
## 
##    BM   CNS 
## 12639  5060

Session Information

sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Rocky Linux 9.5 (Blue Onyx)
## 
## Matrix products: default
## BLAS/LAPACK: /opt/intel/oneapi/mkl/2024.0/lib/libmkl_gf_lp64.so.2;  LAPACK version 3.10.1
## 
## locale:
##  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
##  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
##  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Europe/London
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] ggrepel_0.9.6      harmony_1.2.4      Rcpp_1.0.12        scales_1.4.0      
##  [5] tibble_3.3.1       tidyr_1.3.1        viridis_0.6.5      viridisLite_0.4.2 
##  [9] qs2_0.1.7          knitr_1.51         dplyr_1.1.4        patchwork_1.3.2   
## [13] ggplot2_4.0.2      Seurat_5.4.0       SeuratObject_5.3.0 sp_2.1-4          
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3     rstudioapi_0.16.0      jsonlite_2.0.0        
##   [4] magrittr_2.0.3         spatstat.utils_3.2-1   ggbeeswarm_0.7.3      
##   [7] farver_2.1.2           rmarkdown_2.30         vctrs_0.6.5           
##  [10] ROCR_1.0-12            spatstat.explore_3.7-0 htmltools_0.5.8.1     
##  [13] sass_0.4.9             sctransform_0.4.3      parallelly_1.37.1     
##  [16] KernSmooth_2.23-24     bslib_0.7.0            htmlwidgets_1.6.4     
##  [19] ica_1.0-3              plyr_1.8.9             plotly_4.12.0         
##  [22] zoo_1.8-15             cachem_1.1.0           igraph_2.2.2          
##  [25] mime_0.12              lifecycle_1.0.5        pkgconfig_2.0.3       
##  [28] Matrix_1.7-0           R6_2.6.1               fastmap_1.2.0         
##  [31] fitdistrplus_1.2-6     future_1.33.2          shiny_1.12.1          
##  [34] digest_0.6.35          tensor_1.5.1           RSpectra_0.16-2       
##  [37] irlba_2.3.7            labeling_0.4.3         progressr_0.18.0      
##  [40] fansi_1.0.6            spatstat.sparse_3.1-0  httr_1.4.7            
##  [43] polyclip_1.10-7        abind_1.4-5            mgcv_1.9-1            
##  [46] compiler_4.4.1         withr_3.0.2            S7_0.2.1              
##  [49] fastDummies_1.7.5      MASS_7.3-60.2          tools_4.4.1           
##  [52] vipor_0.4.7            lmtest_0.9-40          otel_0.2.0            
##  [55] beeswarm_0.4.0         httpuv_1.6.16          future.apply_1.11.2   
##  [58] goftest_1.2-3          glue_1.8.0             nlme_3.1-164          
##  [61] promises_1.5.0         grid_4.4.1             Rtsne_0.17            
##  [64] cluster_2.1.6          reshape2_1.4.4         generics_0.1.3        
##  [67] gtable_0.3.6           spatstat.data_3.1-9    data.table_1.15.4     
##  [70] stringfish_0.18.0      utf8_1.2.4             spatstat.geom_3.7-0   
##  [73] RcppAnnoy_0.0.23       RANN_2.6.2             pillar_1.9.0          
##  [76] stringr_1.5.1          spam_2.11-3            RcppHNSW_0.6.0        
##  [79] later_1.4.6            splines_4.4.1          lattice_0.22-6        
##  [82] survival_3.6-4         deldir_2.0-4           tidyselect_1.2.1      
##  [85] miniUI_0.1.2           pbapply_1.7-4          gridExtra_2.3         
##  [88] scattermore_1.2        xfun_0.56              matrixStats_1.5.0     
##  [91] stringi_1.8.4          lazyeval_0.2.2         yaml_2.3.12           
##  [94] evaluate_1.0.5         codetools_0.2-20       cli_3.6.5             
##  [97] uwot_0.2.4             RcppParallel_5.1.8     xtable_1.8-4          
## [100] reticulate_1.45.0      jquerylib_0.1.4        dichromat_2.0-0.1     
## [103] globals_0.16.3         spatstat.random_3.4-4  png_0.1-8             
## [106] ggrastr_1.0.2          spatstat.univar_3.1-6  parallel_4.4.1        
## [109] dotCall64_1.2          listenv_0.9.1          ggridges_0.5.6        
## [112] purrr_1.0.2            rlang_1.1.7            cowplot_1.2.0