1 Introduction

Single-cell sequencing is an emerging technology in the field of immunology and oncology that allows researchers to couple RNA quantification and other modalities, like immune cell receptor profiling at the level of an individual cell. A number of workflows and software packages have been created to process and analyze single-cell transcriptomic data. These packages allow users to take the vast dimensionality of the data generated in single-cell-based experiments and distill the data into novel insights. Unlike the transcriptomic field, there is a lack of options for software that allow for single-cell immune receptor profiling. Enabling users to easily combine RNA and immune profiling, the scRepertoire framework supports use of 10x, single-cell clonal formats and interaction with popular R-based single-cell data pipelines.

scRepertoire is designed to take filter contig outputs from the 10x Genomics Cell Ranger pipeline, process that data to assign clonotype based on two TCR or Ig chains and analyze the clonotype dynamics. The latter can be separated into 1) clonotype-only analysis functions, such as unique clonotypes or clonal space quantification, and 2) interaction with mRNA expression data using Seurat or SingleCellExperiment packages.

References:

Prerequisite: Ensure the All_samples_Merged Seurat object is loaded into your R environment before running the chunks below.

1.1 Load libraries

1.2 Load Seurat Object


#Load Seurat Object merged from cell lines and a control(PBMC) after filtration
All_samples_Merged <- readRDS("../../../0-Seurat_RDS_Final_OBJECT/All_samples_Merged_with_Renamed_Clusters_Cell_state-03-12-2025.rds")

All_samples_Merged
An object of class Seurat 
62900 features across 49305 samples within 6 assays 
Active assay: RNA (36601 features, 0 variable features)
 2 layers present: data, counts
 5 other assays present: ADT, prediction.score.celltype.l1, prediction.score.celltype.l2, prediction.score.celltype.l3, SCT
 5 dimensional reductions calculated: integrated_dr, ref.umap, pca, umap, harmony

1.3 Load contigs

scRepertoire functions using the filtered_contig_annotations.csv output from the 10x Genomics Cell Ranger. This file is located in the ./outs/ directory of the VDJ alignment folder. To generate a list of contigs to use for scRepertoire:

- load the filtered_contig_annotations.csv for each of the samples.

- make a list in the R environment.

2 Combining Contigs into Clones

There are varying definitions of clones in the literature. For the purposes of scRepertoire, we define a clone as cells with shared/trackable complementarity-determining region 3 (CDR3) sequences. Within this definition, one might use amino acid (aa) sequences of one or both chains to define a clone. Alternatively, we could use nucleotide (nt) or the V(D)JC genes (genes) to define a clone. The latter, genes, would be a more permissive definition of “clones,” as multiple amino acid or nucleotide sequences can result from the same gene combination. Another option to define a clone is the use of the V(D)JC and nucleotide sequence (strict). scRepertoire allows for the use of all these definitions of clones and enables users to select both or individual chains to examine.

2.1 Combining Contigs into Clones


combined.TCR <- combineTCR(contig_list, 
                          samples = samples,
                          removeNA = FALSE, 
                          removeMulti = FALSE, 
                          filterMulti = FALSE)

# Export raw data
combined_TCR_df <- do.call(rbind, combined.TCR) %>% mutate(sample = samples[as.numeric(sample)])
write.csv(combined_TCR_df, "combined_TCR_clean.csv", row.names = FALSE)
head(combined_TCR_df[,1:10])
NA

2.2 Prepare Seurat Object (SINGLE CALL)

TCR <- All_samples_Merged
DefaultAssay(TCR) <- "SCT"
TCR <- quietTCRgenes(TCR)

# YOUR ORIGINAL WORKING PARAMETERS (from script)
TCR <- combineExpression(combined.TCR, TCR, 
                        cloneCall = "gene",
                        group.by = "sample",     # ← FIXED: Use "sample" (exists in combined.TCR)
                        proportion = FALSE,
                        cloneSize = c(Single = 1, Small = 5, Medium = 20, Large = 100, Hyperexpanded = 500))

print("✅ SUCCESS - Clone metadata added")
[1] "✅ SUCCESS - Clone metadata added"
DimPlot(TCR, group.by = "cloneSize", reduction = "umap") + 
  scale_color_manual(values = rev(colorblind_vector[1:5]))

3 Combining Clones and Single-Cell Objects

DefaultAssay(TCR) <- "SCT"

# If All_samples_Merged is already loaded in the environment:
TCR <- All_samples_Merged

# Define color palette
colorblind_vector <- hcl.colors(n=7, palette = "inferno", fixup = TRUE)

# Combine expression with filtering of NA clonotype cells
TCR <- combineExpression(
    combined.TCR,
    TCR,
    cloneCall = "gene",
    group.by = "sample",
    proportion = TRUE,
    filterNA = TRUE  # This will exclude cells without clonotype info
)

# You no longer need the manual barcode matching or NA replacements



# Plot UMAP colored by cloneSize
DimPlot(TCR, group.by = "cloneSize", reduction = "umap") +
    scale_color_manual(values = rev(colorblind_vector[c(1, 3, 4, 5, 7)]))




DimPlot(TCR, group.by = "cloneSize", reduction = "umap")


DimPlot(TCR, group.by = "cloneSize", reduction = "umap") +
    scale_color_manual(values=rev(colorblind_vector[c(1,3,4,5,6)]))




#Define color palette 
colorblind_vector <- hcl.colors(n=9, palette = "inferno", fixup = TRUE)

Seurat::DimPlot(TCR, group.by = "cloneSize", reduction = "umap") +
    scale_color_manual(values=rev(colorblind_vector[c(1,3,4,5,7)]))

TCR <- combineExpression(combined.TCR, 
                                   TCR, 
                                   cloneCall="gene", 
                                   group.by = "sample", 
                                   proportion = FALSE, 
                                   cloneSize=c(Single=1, Small=5, Medium=20, Large=100, Hyperexpanded=500))

Seurat::DimPlot(TCR, group.by = "cloneSize", reduction = "umap") +
    scale_color_manual(values=rev(colorblind_vector[c(1,3,4,5,7)]))

4 Visualizations for Single-Cell Objects



clonalOverlay(TCR, 
              reduction = "umap", 
              cutpoint = 1, 
              bins = 10, 
              facet.by = "orig.ident") + 
              guides(color = "none")


#clonalNetwork
#ggraph needs to be loaded due to issues with ggplot
library(ggraph)

clonalNetwork(TCR, 
              reduction = "umap", 
              group.by = "seurat_clusters",
              filter.clones = NULL,
              filter.identity = NULL,
              cloneCall = "aa")



#Examining Cluster 3 only
clonalNetwork(TCR, 
              reduction = "umap", 
              group.by = "seurat_clusters",
              filter.identity = 8,
              cloneCall = "aa")



shared.clones <- clonalNetwork(TCR, 
                               reduction = "umap", 
                               group.by = "seurat_clusters",
                               cloneCall = "aa", 
                               exportClones = TRUE)
head(shared.clones)

#ggraph needs to be loaded due to issues with ggplot
library(ggraph)

#No Identity filter
clonalNetwork(TCR, 
              reduction = "umap", 
              group.by = "seurat_clusters",
              filter.clones = NULL,
              filter.identity = NULL,
              cloneCall = "aa")

# clonalOccupy
#clonalOccupy
clonalOccupy(TCR, 
              x.axis = "seurat_clusters")


clonalOccupy(TCR, 
              x.axis = "orig.ident")

# clonalOccupy
clonalOccupy(TCR, 
              x.axis = "orig.ident")


clonalOccupy(TCR, 
                     x.axis = "orig.ident", 
                     proportion = TRUE, 
                     label = FALSE)


# getCirclize

library(circlize)
library(scales)

circles <- getCirclize(TCR, 
                       group.by = "seurat_clusters")

#Just assigning the normal colors to each cluster
grid.cols <- hue_pal()(length(unique(TCR$seurat_clusters)))
names(grid.cols) <- unique(TCR$seurat_clusters)

#Graphing the chord diagram
chordDiagram(circles, self.link = 1, grid.col = grid.cols)



circles <- getCirclize(TCR, group.by = "orig.ident")

grid.cols <- scales::hue_pal()(length(unique(TCR@active.ident)))
names(grid.cols) <- levels(TCR@active.ident)

chordDiagram(circles, 
             self.link = 1, 
             grid.col = grid.cols)

5 Quantifying Clonal Bias

# # StartracDiversity # From the excellent work by Lei Zhang, et al., the authors introduce new methods for looking at clones by cellular origins and cluster identification. Their STARTRAC software has been adapted to work with scRepertoire and please read and cite their excellent work. # # In order to use the StartracDiversity() function, you will need to include the product of the combinedExpression() function. The second requirement is a column header in the meta data of the Seurat object that has tissue of origin. In the example data, type corresponds to the column “Type”, which includes the “P” and “T” classifiers. The indices can be subsetted for a specific patient or examined overall using the by variable. Importantly, the function uses only the strict definition of a clone of the VDJC genes and the CDR3 nucleotide sequence. # # The indices output includes: # # expa - Clonal Expansion # migr - Cross-tissue Migration # tran - State Transition


Idents(TCR) <- "seurat_clusters"

StartracDiversity(TCR, 
                  type = "orig.ident",
                  group.by = "orig.ident")

NA
NA

6 clonalBias


clonalBias(TCR, 
           cloneCall = "aa", 
           split.by = "orig.ident", 
           group.by = "seurat_clusters",
           n.boots = 10, 
           min.expand =5)

7 TRBV CLONALITY

8 COMPLETE ANALYSIS: TRBV + TRAV (BOTH CHAINS)


library(dplyr)
library(tidyr)
library(stringr)

# COMPLETE FIXED ANALYSIS: TRBV (TCR2) + TRAV (TCR1)
vgene_v_all_counts <- data.frame()

for(i in seq_along(combined.TCR)) {
    contigs <- combined.TCR[[i]]
    sample_name <- names(combined.TCR)[i]
    
    # BETA CHAIN: TRBV from TCR2 (TRBC+ cells)
    beta_counts <- contigs %>%
        filter(grepl("TRBC", TCR2)) %>%
        mutate(v_gene = str_extract(TCR2, "TRBV[^.]+"),
               chain = "TRBV") %>%
        group_by(v_gene, chain) %>%
        summarise(nCells = n(), sample = sample_name, .groups = "drop") %>%
        filter(!is.na(v_gene))
    
    # ALPHA CHAIN: TRAV from TCR1 (TRAV+ cells)
    alpha_counts <- contigs %>%
        filter(grepl("TRAV", TCR1)) %>%
        mutate(v_gene = str_extract(TCR1, "TRAV[^.]+"),
               chain = "TRAV") %>%
        group_by(v_gene, chain) %>%
        summarise(nCells = n(), sample = sample_name, .groups = "drop") %>%
        filter(!is.na(v_gene))
    
    vgene_v_all_counts <- bind_rows(vgene_v_all_counts, beta_counts, alpha_counts)
}

# L4_B low-frequency genes (<0.1% = <7 cells)
l4b_low_freq <- vgene_v_all_counts %>%
    filter(sample == "L4_B", nCells < 7) %>%
    arrange(chain, desc(nCells))

# Wide format (samples × V-genes)
vgene_wide <- vgene_v_all_counts %>%
    select(-chain) %>%
    pivot_wider(names_from = sample, values_from = nCells, values_fill = 0)

# Summary statistics
total_summary <- vgene_v_all_counts %>%
    group_by(sample, chain) %>%
    summarise(total_cells = sum(nCells), .groups = "drop")

# Percentages (% usage per sample/chain)
vgene_percent <- vgene_v_all_counts %>%
    group_by(sample, chain) %>%
    mutate(percent = round(nCells / sum(nCells) * 100, 2)) %>%
    ungroup() %>%
    select(sample, chain, v_gene, nCells, percent)

## SAVE 5 PUBLICATION-READY FILES
write.csv(vgene_wide, "01_Vgene_counts_wide.csv", row.names = FALSE)
write.csv(l4b_low_freq, "02_L4B_lowfreq_Vgenes.csv", row.names = FALSE)
write.csv(total_summary, "03_Vgene_totals_summary.csv", row.names = FALSE)
write.csv(vgene_percent, "04_Vgene_percentages.csv", row.names = FALSE)
write.csv(vgene_v_all_counts, "05_Vgene_long_format.csv", row.names = FALSE)

## COMPREHENSIVE PREVIEW
cat("📊 TOTAL CELLS SUMMARY:\n")
📊 TOTAL CELLS SUMMARY:
print(total_summary)
# A tibble: 16 × 3
   sample chain total_cells
   <chr>  <chr>       <int>
 1 L1     TRAV         5965
 2 L1     TRBV         5997
 3 L2     TRAV         5975
 4 L2     TRBV         5986
 5 L3_B   TRAV         5413
 6 L3_B   TRBV         6075
 7 L4_B   TRAV         6098
 8 L4_B   TRBV         6109
 9 L5     TRAV         5694
10 L5     TRBV         5542
11 L6     TRAV         5015
12 L6     TRBV         5021
13 L7     TRAV         5783
14 L7     TRBV         5809
15 PBMC   TRAV         5257
16 PBMC   TRBV         6031
cat(sprintf("\n🎯 L4_B low-freq genes (<0.1%%): %d\n", nrow(l4b_low_freq)))

🎯 L4_B low-freq genes (<0.1%): 25
print("Top 10 L4_B low-freq:")
[1] "Top 10 L4_B low-freq:"
print(head(l4b_low_freq, 10))
       v_gene chain nCells sample
1    TRAV12-2  TRAV      2   L4_B
2      TRAV21  TRAV      2   L4_B
3    TRAV26-2  TRAV      2   L4_B
4      TRAV10  TRAV      1   L4_B
5    TRAV13-1  TRAV      1   L4_B
6    TRAV13-2  TRAV      1   L4_B
7      TRAV16  TRAV      1   L4_B
8      TRAV17  TRAV      1   L4_B
9      TRAV27  TRAV      1   L4_B
10 TRAV29/DV5  TRAV      1   L4_B
cat("\n🔥 TOP 3 V-GENES PER SAMPLE/CHAIN:\n")

🔥 TOP 3 V-GENES PER SAMPLE/CHAIN:
vgene_v_all_counts %>%
    group_by(sample, chain) %>%
    slice_max(nCells, n = 3) %>%
    ungroup() %>%
    arrange(sample, chain, desc(nCells)) %>%
    mutate(percent = round(nCells / total_summary$total_cells[match(sample, total_summary$sample)], 1)) %>%
    select(sample, chain, v_gene, nCells, percent) %>%
    print(n = Inf)
# A tibble: 28 × 5
   sample chain v_gene       nCells percent
   <chr>  <chr> <chr>         <int>   <dbl>
 1 L1     TRAV  TRAV38-2/DV8   5965     1  
 2 L1     TRBV  TRBV20-1       5997     1  
 3 L2     TRAV  TRAV38-2/DV8   5975     1  
 4 L2     TRBV  TRBV20-1       5986     1  
 5 L3_B   TRAV  TRAV4          5413     1  
 6 L3_B   TRBV  TRBV20-1       6075     1.1
 7 L4_B   TRAV  TRAV4          6083     1  
 8 L4_B   TRAV  TRAV12-2          2     0  
 9 L4_B   TRAV  TRAV21            2     0  
10 L4_B   TRAV  TRAV26-2          2     0  
11 L4_B   TRBV  TRBV20-1       6083     1  
12 L4_B   TRBV  TRBV24-1         12     0  
13 L4_B   TRBV  TRBV9             2     0  
14 L5     TRAV  TRAV17         5382     0.9
15 L5     TRAV  TRAV9-2         312     0.1
16 L5     TRBV  TRBV20-1       5542     1  
17 L6     TRAV  TRAV17         4887     1  
18 L6     TRAV  TRAV9-2         128     0  
19 L6     TRBV  TRBV20-1       5021     1  
20 L7     TRAV  TRAV17         5555     1  
21 L7     TRAV  TRAV9-2         228     0  
22 L7     TRBV  TRBV20-1       5809     1  
23 PBMC   TRAV  TRAV29/DV5      406     0.1
24 PBMC   TRAV  TRAV13-1        308     0.1
25 PBMC   TRAV  TRAV9-2         257     0  
26 PBMC   TRBV  TRBV20-1        530     0.1
27 PBMC   TRBV  TRBV5-1         529     0.1
28 PBMC   TRBV  TRBV2           348     0.1
cat("\n✅ 5 FILES SAVED FOR THESIS:\n")

✅ 5 FILES SAVED FOR THESIS:
cat("- 01_Vgene_counts_wide.csv (heatmap input)\n")
- 01_Vgene_counts_wide.csv (heatmap input)
cat("- 04_Vgene_percentages.csv (Fig Table)\n")
- 04_Vgene_percentages.csv (Fig Table)
cat("- 02_L4B_lowfreq_Vgenes.csv (heterogeneity markers)\n")
- 02_L4B_lowfreq_Vgenes.csv (heterogeneity markers)

9 Vgene-heatmap-visualization


library(reshape2)
library(ggplot2)
library(dplyr)
library(stringr)

# FIXED: Clean sample names + TRBV24-1 contrast
vgene_percent <- read.csv("04_Vgene_percentages.csv")

# Sample name mapping (display-friendly)
sample_mapping <- c("L1" = "L1", "L2" = "L2", "L3_B" = "L3", "L4_B" = "L4", 
                    "L5" = "L5", "L6" = "L6", "L7" = "L7")
cell_lines <- names(sample_mapping)

# FILTER: TRBV20-1 + TRBV24-1 (cell lines only)
target_genes <- c("TRBV20-1", "TRBV24-1")
gene_table_filtered <- vgene_percent %>%
    filter(v_gene %in% target_genes,
           sample %in% cell_lines) %>%
    mutate(sample_display = sample_mapping[sample]) %>%
    select(sample_display, v_gene, percent) %>%
    pivot_wider(names_from = sample_display, values_from = percent, values_fill = 0) %>%
    column_to_rownames("v_gene") %>%
    as.matrix()

cat("Filtered genes:", nrow(gene_table_filtered), "\n")
Filtered genes: 2 
print(rownames(gene_table_filtered))
[1] "TRBV20-1" "TRBV24-1"
# Melt with display names
mat_melt <- melt(as.matrix(gene_table_filtered))
colnames(mat_melt) <- c("Gene", "Sample", "Percent")

# FIXED HEATMAP: Better contrast for TRBV24-1
p1 <- ggplot(mat_melt, aes(x = Sample, y = Gene, fill = Percent)) +
    geom_tile(lwd = 0.6, color = "white", height = 0.85, width = 0.85) +
    scale_fill_gradientn(name = "% of T-cells",
                         colors = c("navy", "gold", "darkred"),
                         limits = c(0, 100),
                         breaks = c(0, 25, 50, 75, 100),
                         na.value = "grey90") +
    theme_classic(base_size = 13) +
    theme(axis.text.x  = element_text(angle = 45, vjust = 1, hjust = 1, size = 12, face = "bold"),
          axis.text.y  = element_text(size = 13, face = "bold"),
          axis.title   = element_blank(),
          legend.position = "right",
          legend.title.align = 0.5,
          legend.title = element_text(size = 12, face = "bold"),
          plot.title = element_text(size = 15, face = "bold", hjust = 0.5)) +
    labs(title = "TRBV Oligoclonality in Sézary Cell Lines\nTRBV20-1 dominates (≥99%); TRBV24-1: L4-specific minor clone")

print(p1)

# HIGH-RES PUBLICATION OUTPUT
ggsave("Fig_TRBV_heatmap_L3_L4_clean.png", p1, width = 11, height = 4.5, dpi = 600, bg = "white")
ggsave("Fig_TRBV_heatmap_L3_L4_clean.pdf", p1, width = 11, height = 4.5)


cat("\n✅ FIXED HEATMAP SAVED (L3/L4 labels + TRBV24-1 visible!)\n")

✅ FIXED HEATMAP SAVED (L3/L4 labels + TRBV24-1 visible!)

10 Vgene-heatmap-visualization


library(reshape2)
library(ggplot2)
library(dplyr)
library(stringr)

vgene_percent <- read.csv("04_Vgene_percentages.csv")
sample_mapping <- c("L1" = "L1", "L2" = "L2", "L3_B" = "L3", "L4_B" = "L4", 
                    "L5" = "L5", "L6" = "L6", "L7" = "L7")
cell_lines <- names(sample_mapping)

# === PLOT 1: TRBV20-1 ONLY (Dominance) ===
trbv20_data <- vgene_percent %>%
    filter(v_gene == "TRBV20-1", sample %in% cell_lines) %>%
    mutate(sample_display = sample_mapping[sample])

p20 <- ggplot(trbv20_data, aes(x = reorder(sample_display, percent), y = percent)) +
    geom_col(fill = "darkred", alpha = 0.8, width = 0.7) +
    geom_text(aes(label = sprintf("%.1f%%", percent)), hjust = -0.1, size = 4) +
    coord_flip() +
    scale_y_continuous(limits = c(0, 102), expand = expansion(mult = c(0, 0.08))) +
    labs(title = "TRBV20-1 Dominance (≥99% in all Sézary lines)", 
         x = "Cell Line", y = "% of TRB+ T-cells") +
    theme_classic(base_size = 13) +
    theme(plot.title = element_text(size = 15, face = "bold"),
          axis.text.y = element_text(size = 12, face = "bold"),
          axis.title = element_text(size = 12))

# === PLOT 2: TRBV24-1 ONLY (L4-specific minor clone) ===
trbv24_data <- vgene_percent %>%
    filter(v_gene == "TRBV24-1", sample %in% cell_lines) %>%
    mutate(sample_display = sample_mapping[sample])

p24 <- ggplot(trbv24_data, aes(x = reorder(sample_display, percent), y = percent)) +
    geom_col(aes(fill = sample_display == "L4"), width = 0.7, alpha = 0.8) +
    scale_fill_manual(values = c("FALSE" = "steelblue", "TRUE" = "orange"), guide = "none") +
    geom_text(aes(label = sprintf("%.3f%%", percent)), hjust = -0.1, size = 4) +
    coord_flip() +
    scale_y_continuous(limits = c(0, 0.3), expand = expansion(mult = c(0, 0.15))) +
    labs(title = "TRBV24-1: L4-specific minor clone (0.196%)", 
         x = "Cell Line", y = "% of TRB+ T-cells") +
    theme_classic(base_size = 13) +
    theme(plot.title = element_text(size = 15, face = "bold"),
          axis.text.y = element_text(size = 12, face = "bold"),
          axis.title = element_text(size = 12))

# COMBINED FIGURE
p_combined <- p20 + p24 + plot_layout(ncol = 2) & theme(legend.position = "none")
print(p_combined)

# SAVE THESIS FIGURE
ggsave("Fig_TRBV20-24_separate_L4_highlight.png", p_combined, width = 14, height = 7, dpi = 600, bg = "white")
ggsave("Fig_TRBV20-24_separate_L4_highlight.pdf", p_combined, width = 14, height = 7)


cat("✅ SEPARATE PLOTS SAVED - L4 TRBV24-1 highlighted in ORANGE!\n")
✅ SEPARATE PLOTS SAVED - L4 TRBV24-1 highlighted in ORANGE!
library(ggplot2)
library(dplyr)
library(scales)
library(RColorBrewer)

sample_mapping <- c("L1" = "L1", "L2" = "L2", "L3_B" = "L3", "L4_B" = "L4", 
                    "L5" = "L5", "L6" = "L6", "L7" = "L7")
cell_lines <- c("L1", "L2", "L3_B", "L4_B", "L5", "L6", "L7")

percent_data <- vgene_percent %>%
    filter(chain == "TRBV") %>%
    mutate(sample_display = sample_mapping[sample]) %>%
    group_by(sample_display) %>%
    mutate(percent_scRepertoire = nCells / sum(nCells) * 100) %>%
    ungroup()

l4_trbv24 <- percent_data %>% filter(sample_display == "L4", v_gene == "TRBV24-1")
trbv20_data <- percent_data %>% filter(v_gene == "TRBV20-1", sample %in% cell_lines)
minor_data <- percent_data %>%
    filter(sample %in% cell_lines, v_gene != "TRBV20-1", v_gene != "TRBV24-1") %>%
    group_by(sample_display) %>%
    slice_max(percent_scRepertoire, n = 10) %>%
    ungroup()

minor_colors <- colorRampPalette(RColorBrewer::brewer.pal(9, "PuBuGn")[3:9])(length(unique(minor_data$v_gene)))

p_final <- ggplot() +
    geom_col(data = minor_data, 
             aes(x = sample_display, y = percent_scRepertoire/100, fill = v_gene),
             position = "stack", alpha = 0.7, width = 0.85, color = NA) +
    scale_fill_manual(values = minor_colors, guide = "none") +
    geom_col(data = trbv20_data, 
             aes(x = sample_display, y = percent_scRepertoire/100),
             fill = "#E63946", alpha = 0.95, width = 0.85, color = "white", linewidth = 0.5) +
    geom_col(data = l4_trbv24, 
             aes(x = sample_display, y = percent_scRepertoire/100),
             fill = "#F4A261", alpha = 1, width = 0.85, color = "black", linewidth = 0.8) +
    geom_text(data = trbv20_data,
              aes(x = sample_display, y = 0.85, label = sprintf("%.0f%%", percent_scRepertoire)),
              size = 4.5, fontface = "bold", color = "white") +
    geom_text(data = l4_trbv24,
              aes(x = sample_display, y = 0.15, 
                  label = sprintf("TRBV24-1\n0.20%%")),
              hjust = -0.1, size = 4.2, fontface = "bold", color = "black") +
    scale_y_continuous(labels = percent_format(accuracy = 1), expand = expansion(mult = c(0, 0.12))) +
    coord_flip() +
    labs(title = "TRBV Monoclonal Dominance in Sézary Cell Lines",
         subtitle = "TRBV20-1 (99-100%) + L4-specific TRBV24-1 (0.20%)") +
    theme_classic(base_size = 15) +
    theme(axis.text.y = element_text(size = 14, face = "bold"),
          axis.title.y = element_blank(),
          legend.position = "none",
          plot.title = element_text(size = 18, face = "bold", hjust = 0.5),
          plot.subtitle = element_text(size = 14, hjust = 0.5))

print(p_final)


ggsave("Fig_TRBV_monoclonal_final.png", p_final, width = 12, height = 9, dpi = 600, bg = "white")
ggsave("Fig_TRBV_monoclonal_final.pdf", p_final, width = 12, height = 9, bg = "white")

cat("✅ MONOCLONAL title + L4 biclonal highlight!\n")
✅ MONOCLONAL title + L4 biclonal highlight!

```

---
title: "TCR Analysis-23-02-2026_Visualization"
author: Nasir Mahmood Abbasi
date: "`r Sys.Date()`"
output:
  html_notebook:
    css: style.css
    number_sections: true
    toc: true
    toc_float:
      collapsed: true
    theme: journal
---


# **Introduction**

Single-cell sequencing is an emerging technology in the field of immunology and oncology that allows researchers to couple RNA quantification and other modalities, like immune cell receptor profiling at the level of an individual cell. A number of workflows and software packages have been created to process and analyze single-cell transcriptomic data. These packages allow users to take the vast dimensionality of the data generated in single-cell-based experiments and distill the data into novel insights. Unlike the transcriptomic field, there is a lack of options for software that allow for single-cell immune receptor profiling. Enabling users to easily combine RNA and immune profiling, the scRepertoire framework supports use of 10x, single-cell clonal formats and interaction with popular R-based single-cell data pipelines.

scRepertoire is designed to take filter contig outputs from the 10x Genomics Cell Ranger pipeline, process that data to assign clonotype based on two TCR or Ig chains and analyze the clonotype dynamics. The latter can be separated into 1) clonotype-only analysis functions, such as unique clonotypes or clonal space quantification, and 2) interaction with mRNA expression data using Seurat or SingleCellExperiment packages.

**References:**

- [scRepertoire Vignette](https://www.bioconductor.org/packages/release/bioc/vignettes/scRepertoire/inst/doc/vignette.html)  
- [Borch.dev scRepertoire](https://www.borch.dev/uploads/screpertoire/)



**Prerequisite:** Ensure the `All_samples_Merged` Seurat object is loaded into your R environment before running the chunks below.

## Load libraries
```{r, include=FALSE}

# Core
library(Seurat)
library(SeuratObject)
library(SeuratData)
library(patchwork)
library(dplyr)
library(ggplot2)
library(tidyverse)
library(rmarkdown)
library(tinytex)
library(grid)
library(cowplot)
library(presto)

# TCR Analysis
library(scRepertoire)
library(SingleCellExperiment)
library(circlize)
library(scales)

# Tables/exports
library(gt)
library(writexl)

# Colors (consistent throughout)
colorblind_vector <- hcl.colors(9, "inferno", fixup = TRUE)
options(repr.plot.width=12, repr.plot.height=8)

```

## Load Seurat Object
```{r}

#Load Seurat Object merged from cell lines and a control(PBMC) after filtration
All_samples_Merged <- readRDS("../../../0-Seurat_RDS_Final_OBJECT/All_samples_Merged_with_Renamed_Clusters_Cell_state-03-12-2025.rds")

All_samples_Merged
```

## Load contigs

scRepertoire functions using the filtered_contig_annotations.csv output from the 10x Genomics Cell Ranger. This file is located in the ./outs/ directory of the VDJ alignment folder. To generate a list of contigs to use for scRepertoire:

**-   load the filtered_contig_annotations.csv for each of the samples.**

**-   make a list in the R environment.**


```{r TCR, include=FALSE}
# Contig files (ROBUST PATHS - replace with your actual paths)
contig_paths <- list(
  L1 = "/run/user/1000/gvfs/smb-share:server=10.144.142.131,share=commun/Nasir/All_Data_SS/Audrey_Gros/CellRanger/L1/outs/per_sample_outs/L1/vdj_t/filtered_contig_annotations.csv",
  L2 = "/run/user/1000/gvfs/smb-share:server=10.144.142.131,share=commun/Nasir/All_Data_SS/Audrey_Gros/CellRanger/L2/outs/per_sample_outs/L2/vdj_t/filtered_contig_annotations.csv",
  L3_B = "/run/user/1000/gvfs/smb-share:server=10.144.142.131,share=commun/Nasir/All_Data_SS/Audrey_Gros/CellRanger/L3_CITE_B/outs/per_sample_outs/L3_CITE_B/vdj_t/filtered_contig_annotations.csv",
  L4_B = "/run/user/1000/gvfs/smb-share:server=10.144.142.131,share=commun/Nasir/All_Data_SS/Audrey_Gros/CellRanger/L4_B/outs/per_sample_outs/L4_B/vdj_t/filtered_contig_annotations.csv",
  L5 = "/run/user/1000/gvfs/smb-share:server=10.144.142.131,share=commun/Nasir/All_Data_SS/Audrey_Gros/CellRanger/L5/outs/per_sample_outs/L5/vdj_t/filtered_contig_annotations.csv",
  L6 = "/run/user/1000/gvfs/smb-share:server=10.144.142.131,share=commun/Nasir/All_Data_SS/Audrey_Gros/CellRanger/L6_CITE/outs/per_sample_outs/L6_CITE/vdj_t/filtered_contig_annotations.csv",
  L7 = "/run/user/1000/gvfs/smb-share:server=10.144.142.131,share=commun/Nasir/All_Data_SS/Audrey_Gros/CellRanger/L7/outs/per_sample_outs/L7/vdj_t/filtered_contig_annotations.csv",
  PBMC = "/run/user/1000/gvfs/smb-share:server=10.144.142.131,share=commun/Nasir/All_Data_SS/Audrey_Gros/CellRanger/PBMC/outs/per_sample_outs/PBMC/vdj_t/filtered_contig_annotations.csv"
)

# Load contigs with error checking
contig_list <- list()
samples <- names(contig_paths)
for(i in seq_along(contig_paths)) {
  if(file.exists(contig_paths[[i]])) {
    contig_list[[samples[i]]] <- read.csv(contig_paths[[i]])
  } else {
    warning("Missing: ", contig_paths[[i]])
  }
}
```

# **Combining Contigs into Clones**

There are varying definitions of clones in the literature. For the purposes of scRepertoire, we define a clone as cells with shared/trackable complementarity-determining region 3 (CDR3) sequences. Within this definition, one might use amino acid (aa) sequences of one or both chains to define a clone. Alternatively, we could use nucleotide (nt) or the V(D)JC genes (genes) to define a clone. The latter, genes, would be a more permissive definition of “clones,” as multiple amino acid or nucleotide sequences can result from the same gene combination. Another option to define a clone is the use of the V(D)JC and nucleotide sequence (strict). scRepertoire allows for the use of all these definitions of clones and enables users to select both or individual chains to examine.

## Combining Contigs into Clones
```{r combinedTCR, fig.height=4, fig.width=6}

combined.TCR <- combineTCR(contig_list, 
                          samples = samples,
                          removeNA = FALSE, 
                          removeMulti = FALSE, 
                          filterMulti = FALSE)

# Export raw data
combined_TCR_df <- do.call(rbind, combined.TCR) %>% mutate(sample = samples[as.numeric(sample)])
write.csv(combined_TCR_df, "combined_TCR_clean.csv", row.names = FALSE)
head(combined_TCR_df[,1:10])

```

## Prepare Seurat Object (SINGLE CALL)
```{r , fig.height=8, fig.width=12}
TCR <- All_samples_Merged
DefaultAssay(TCR) <- "SCT"
TCR <- quietTCRgenes(TCR)

# YOUR ORIGINAL WORKING PARAMETERS (from script)
TCR <- combineExpression(combined.TCR, TCR, 
                        cloneCall = "gene",
                        group.by = "sample",     # ← FIXED: Use "sample" (exists in combined.TCR)
                        proportion = FALSE,
                        cloneSize = c(Single = 1, Small = 5, Medium = 20, Large = 100, Hyperexpanded = 500))

print("✅ SUCCESS - Clone metadata added")
DimPlot(TCR, group.by = "cloneSize", reduction = "umap") + 
  scale_color_manual(values = rev(colorblind_vector[1:5]))
```

# **Combining Clones and Single-Cell Objects**
```{r , fig.height=4, fig.width=8}
DefaultAssay(TCR) <- "SCT"

# If All_samples_Merged is already loaded in the environment:
TCR <- All_samples_Merged

# Define color palette
colorblind_vector <- hcl.colors(n=7, palette = "inferno", fixup = TRUE)

# Combine expression with filtering of NA clonotype cells
TCR <- combineExpression(
    combined.TCR,
    TCR,
    cloneCall = "gene",
    group.by = "sample",
    proportion = TRUE,
    filterNA = TRUE  # This will exclude cells without clonotype info
)

# You no longer need the manual barcode matching or NA replacements



# Plot UMAP colored by cloneSize
DimPlot(TCR, group.by = "cloneSize", reduction = "umap") +
    scale_color_manual(values = rev(colorblind_vector[c(1, 3, 4, 5, 7)]))



DimPlot(TCR, group.by = "cloneSize", reduction = "umap")

DimPlot(TCR, group.by = "cloneSize", reduction = "umap") +
    scale_color_manual(values=rev(colorblind_vector[c(1,3,4,5,6)]))



#Define color palette 
colorblind_vector <- hcl.colors(n=9, palette = "inferno", fixup = TRUE)

Seurat::DimPlot(TCR, group.by = "cloneSize", reduction = "umap") +
    scale_color_manual(values=rev(colorblind_vector[c(1,3,4,5,7)]))
```




```{r , fig.height=4, fig.width=8}
TCR <- combineExpression(combined.TCR, 
                                   TCR, 
                                   cloneCall="gene", 
                                   group.by = "sample", 
                                   proportion = FALSE, 
                                   cloneSize=c(Single=1, Small=5, Medium=20, Large=100, Hyperexpanded=500))

Seurat::DimPlot(TCR, group.by = "cloneSize", reduction = "umap") +
    scale_color_manual(values=rev(colorblind_vector[c(1,3,4,5,7)]))
```

# **Visualizations for Single-Cell Objects**
```{r , fig.height=8, fig.width=12}


clonalOverlay(TCR, 
              reduction = "umap", 
              cutpoint = 1, 
              bins = 10, 
              facet.by = "orig.ident") + 
              guides(color = "none")

```


```{r , fig.height=14, fig.width=18}

#clonalNetwork
#ggraph needs to be loaded due to issues with ggplot
library(ggraph)

clonalNetwork(TCR, 
              reduction = "umap", 
              group.by = "seurat_clusters",
              filter.clones = NULL,
              filter.identity = NULL,
              cloneCall = "aa")


#Examining Cluster 3 only
clonalNetwork(TCR, 
              reduction = "umap", 
              group.by = "seurat_clusters",
              filter.identity = 8,
              cloneCall = "aa")


shared.clones <- clonalNetwork(TCR, 
                               reduction = "umap", 
                               group.by = "seurat_clusters",
                               cloneCall = "aa", 
                               exportClones = TRUE)
head(shared.clones)

#ggraph needs to be loaded due to issues with ggplot
library(ggraph)

#No Identity filter
clonalNetwork(TCR, 
              reduction = "umap", 
              group.by = "seurat_clusters",
              filter.clones = NULL,
              filter.identity = NULL,
              cloneCall = "aa")
```




```{r , fig.height=8, fig.width=16}
# clonalOccupy
#clonalOccupy
clonalOccupy(TCR, 
              x.axis = "seurat_clusters")

clonalOccupy(TCR, 
              x.axis = "orig.ident")

```


```{r , fig.height=20, fig.width=14}
# clonalOccupy
clonalOccupy(TCR, 
              x.axis = "orig.ident")

clonalOccupy(TCR, 
                     x.axis = "orig.ident", 
                     proportion = TRUE, 
                     label = FALSE)
```


```{r , fig.height=4, fig.width=6}

# getCirclize

library(circlize)
library(scales)

circles <- getCirclize(TCR, 
                       group.by = "seurat_clusters")

#Just assigning the normal colors to each cluster
grid.cols <- hue_pal()(length(unique(TCR$seurat_clusters)))
names(grid.cols) <- unique(TCR$seurat_clusters)

#Graphing the chord diagram
chordDiagram(circles, self.link = 1, grid.col = grid.cols)


circles <- getCirclize(TCR, group.by = "orig.ident")

grid.cols <- scales::hue_pal()(length(unique(TCR@active.ident)))
names(grid.cols) <- levels(TCR@active.ident)

chordDiagram(circles, 
             self.link = 1, 
             grid.col = grid.cols)
```


# **Quantifying Clonal Bias**

**# # StartracDiversity
# From the excellent work by Lei Zhang, et al., the authors introduce new methods for looking at clones by cellular origins and cluster identification. Their STARTRAC software has been adapted to work with scRepertoire and please read and cite their excellent work.
# 
# In order to use the StartracDiversity() function, you will need to include the product of the combinedExpression() function. The second requirement is a column header in the meta data of the Seurat object that has tissue of origin. In the example data, type corresponds to the column “Type”, which includes the “P” and “T” classifiers. The indices can be subsetted for a specific patient or examined overall using the by variable. Importantly, the function uses only the strict definition of a clone of the VDJC genes and the CDR3 nucleotide sequence.
# 
# The indices output includes:
# 
# expa - Clonal Expansion
# migr - Cross-tissue Migration
# tran - State Transition
**


```{r , fig.height=8, fig.width=12}

Idents(TCR) <- "seurat_clusters"

StartracDiversity(TCR, 
                  type = "orig.ident",
                  group.by = "orig.ident")


```

# **clonalBias**
```{r , fig.height=4, fig.width=10}

clonalBias(TCR, 
           cloneCall = "aa", 
           split.by = "orig.ident", 
           group.by = "seurat_clusters",
           n.boots = 10, 
           min.expand =5)

```


# **TRBV CLONALITY**

# COMPLETE ANALYSIS: TRBV + TRAV (BOTH CHAINS)
```{r}

library(dplyr)
library(tidyr)
library(stringr)

# COMPLETE FIXED ANALYSIS: TRBV (TCR2) + TRAV (TCR1)
vgene_v_all_counts <- data.frame()

for(i in seq_along(combined.TCR)) {
    contigs <- combined.TCR[[i]]
    sample_name <- names(combined.TCR)[i]
    
    # BETA CHAIN: TRBV from TCR2 (TRBC+ cells)
    beta_counts <- contigs %>%
        filter(grepl("TRBC", TCR2)) %>%
        mutate(v_gene = str_extract(TCR2, "TRBV[^.]+"),
               chain = "TRBV") %>%
        group_by(v_gene, chain) %>%
        summarise(nCells = n(), sample = sample_name, .groups = "drop") %>%
        filter(!is.na(v_gene))
    
    # ALPHA CHAIN: TRAV from TCR1 (TRAV+ cells)
    alpha_counts <- contigs %>%
        filter(grepl("TRAV", TCR1)) %>%
        mutate(v_gene = str_extract(TCR1, "TRAV[^.]+"),
               chain = "TRAV") %>%
        group_by(v_gene, chain) %>%
        summarise(nCells = n(), sample = sample_name, .groups = "drop") %>%
        filter(!is.na(v_gene))
    
    vgene_v_all_counts <- bind_rows(vgene_v_all_counts, beta_counts, alpha_counts)
}

# L4_B low-frequency genes (<0.1% = <7 cells)
l4b_low_freq <- vgene_v_all_counts %>%
    filter(sample == "L4_B", nCells < 7) %>%
    arrange(chain, desc(nCells))

# Wide format (samples × V-genes)
vgene_wide <- vgene_v_all_counts %>%
    select(-chain) %>%
    pivot_wider(names_from = sample, values_from = nCells, values_fill = 0)

# Summary statistics
total_summary <- vgene_v_all_counts %>%
    group_by(sample, chain) %>%
    summarise(total_cells = sum(nCells), .groups = "drop")

# Percentages (% usage per sample/chain)
vgene_percent <- vgene_v_all_counts %>%
    group_by(sample, chain) %>%
    mutate(percent = round(nCells / sum(nCells) * 100, 2)) %>%
    ungroup() %>%
    select(sample, chain, v_gene, nCells, percent)

## SAVE 5 PUBLICATION-READY FILES
write.csv(vgene_wide, "01_Vgene_counts_wide.csv", row.names = FALSE)
write.csv(l4b_low_freq, "02_L4B_lowfreq_Vgenes.csv", row.names = FALSE)
write.csv(total_summary, "03_Vgene_totals_summary.csv", row.names = FALSE)
write.csv(vgene_percent, "04_Vgene_percentages.csv", row.names = FALSE)
write.csv(vgene_v_all_counts, "05_Vgene_long_format.csv", row.names = FALSE)

## COMPREHENSIVE PREVIEW
cat("📊 TOTAL CELLS SUMMARY:\n")
print(total_summary)

cat(sprintf("\n🎯 L4_B low-freq genes (<0.1%%): %d\n", nrow(l4b_low_freq)))
print("Top 10 L4_B low-freq:")
print(head(l4b_low_freq, 10))

cat("\n🔥 TOP 3 V-GENES PER SAMPLE/CHAIN:\n")
vgene_v_all_counts %>%
    group_by(sample, chain) %>%
    slice_max(nCells, n = 3) %>%
    ungroup() %>%
    arrange(sample, chain, desc(nCells)) %>%
    mutate(percent = round(nCells / total_summary$total_cells[match(sample, total_summary$sample)], 1)) %>%
    select(sample, chain, v_gene, nCells, percent) %>%
    print(n = Inf)

cat("\n✅ 5 FILES SAVED FOR THESIS:\n")
cat("- 01_Vgene_counts_wide.csv (heatmap input)\n")
cat("- 04_Vgene_percentages.csv (Fig Table)\n")
cat("- 02_L4B_lowfreq_Vgenes.csv (heterogeneity markers)\n")

```

# Vgene-heatmap-visualization
```{r}

library(reshape2)
library(ggplot2)
library(dplyr)
library(stringr)

# FIXED: Clean sample names + TRBV24-1 contrast
vgene_percent <- read.csv("04_Vgene_percentages.csv")

# Sample name mapping (display-friendly)
sample_mapping <- c("L1" = "L1", "L2" = "L2", "L3_B" = "L3", "L4_B" = "L4", 
                    "L5" = "L5", "L6" = "L6", "L7" = "L7")
cell_lines <- names(sample_mapping)

# FILTER: TRBV20-1 + TRBV24-1 (cell lines only)
target_genes <- c("TRBV20-1", "TRBV24-1")
gene_table_filtered <- vgene_percent %>%
    filter(v_gene %in% target_genes,
           sample %in% cell_lines) %>%
    mutate(sample_display = sample_mapping[sample]) %>%
    select(sample_display, v_gene, percent) %>%
    pivot_wider(names_from = sample_display, values_from = percent, values_fill = 0) %>%
    column_to_rownames("v_gene") %>%
    as.matrix()

cat("Filtered genes:", nrow(gene_table_filtered), "\n")
print(rownames(gene_table_filtered))

# Melt with display names
mat_melt <- melt(as.matrix(gene_table_filtered))
colnames(mat_melt) <- c("Gene", "Sample", "Percent")

# FIXED HEATMAP: Better contrast for TRBV24-1
p1 <- ggplot(mat_melt, aes(x = Sample, y = Gene, fill = Percent)) +
    geom_tile(lwd = 0.6, color = "white", height = 0.85, width = 0.85) +
    scale_fill_gradientn(name = "% of T-cells",
                         colors = c("navy", "gold", "darkred"),
                         limits = c(0, 100),
                         breaks = c(0, 25, 50, 75, 100),
                         na.value = "grey90") +
    theme_classic(base_size = 13) +
    theme(axis.text.x  = element_text(angle = 45, vjust = 1, hjust = 1, size = 12, face = "bold"),
          axis.text.y  = element_text(size = 13, face = "bold"),
          axis.title   = element_blank(),
          legend.position = "right",
          legend.title.align = 0.5,
          legend.title = element_text(size = 12, face = "bold"),
          plot.title = element_text(size = 15, face = "bold", hjust = 0.5)) +
    labs(title = "TRBV Oligoclonality in Sézary Cell Lines\nTRBV20-1 dominates (≥99%); TRBV24-1: L4-specific minor clone")

print(p1)

# HIGH-RES PUBLICATION OUTPUT
ggsave("Fig_TRBV_heatmap_L3_L4_clean.png", p1, width = 11, height = 4.5, dpi = 600, bg = "white")
ggsave("Fig_TRBV_heatmap_L3_L4_clean.pdf", p1, width = 11, height = 4.5)

cat("\n✅ FIXED HEATMAP SAVED (L3/L4 labels + TRBV24-1 visible!)\n")

```



# Vgene-heatmap-visualization
```{r}

library(reshape2)
library(ggplot2)
library(dplyr)
library(stringr)

vgene_percent <- read.csv("04_Vgene_percentages.csv")
sample_mapping <- c("L1" = "L1", "L2" = "L2", "L3_B" = "L3", "L4_B" = "L4", 
                    "L5" = "L5", "L6" = "L6", "L7" = "L7")
cell_lines <- names(sample_mapping)

# === PLOT 1: TRBV20-1 ONLY (Dominance) ===
trbv20_data <- vgene_percent %>%
    filter(v_gene == "TRBV20-1", sample %in% cell_lines) %>%
    mutate(sample_display = sample_mapping[sample])

p20 <- ggplot(trbv20_data, aes(x = reorder(sample_display, percent), y = percent)) +
    geom_col(fill = "darkred", alpha = 0.8, width = 0.7) +
    geom_text(aes(label = sprintf("%.1f%%", percent)), hjust = -0.1, size = 4) +
    coord_flip() +
    scale_y_continuous(limits = c(0, 102), expand = expansion(mult = c(0, 0.08))) +
    labs(title = "TRBV20-1 Dominance (≥99% in all Sézary lines)", 
         x = "Cell Line", y = "% of TRB+ T-cells") +
    theme_classic(base_size = 13) +
    theme(plot.title = element_text(size = 15, face = "bold"),
          axis.text.y = element_text(size = 12, face = "bold"),
          axis.title = element_text(size = 12))

# === PLOT 2: TRBV24-1 ONLY (L4-specific minor clone) ===
trbv24_data <- vgene_percent %>%
    filter(v_gene == "TRBV24-1", sample %in% cell_lines) %>%
    mutate(sample_display = sample_mapping[sample])

p24 <- ggplot(trbv24_data, aes(x = reorder(sample_display, percent), y = percent)) +
    geom_col(aes(fill = sample_display == "L4"), width = 0.7, alpha = 0.8) +
    scale_fill_manual(values = c("FALSE" = "steelblue", "TRUE" = "orange"), guide = "none") +
    geom_text(aes(label = sprintf("%.3f%%", percent)), hjust = -0.1, size = 4) +
    coord_flip() +
    scale_y_continuous(limits = c(0, 0.3), expand = expansion(mult = c(0, 0.15))) +
    labs(title = "TRBV24-1: L4-specific minor clone (0.196%)", 
         x = "Cell Line", y = "% of TRB+ T-cells") +
    theme_classic(base_size = 13) +
    theme(plot.title = element_text(size = 15, face = "bold"),
          axis.text.y = element_text(size = 12, face = "bold"),
          axis.title = element_text(size = 12))

# COMBINED FIGURE
p_combined <- p20 + p24 + plot_layout(ncol = 2) & theme(legend.position = "none")
print(p_combined)

# SAVE THESIS FIGURE
ggsave("Fig_TRBV20-24_separate_L4_highlight.png", p_combined, width = 14, height = 7, dpi = 600, bg = "white")
ggsave("Fig_TRBV20-24_separate_L4_highlight.pdf", p_combined, width = 14, height = 7)

cat("✅ SEPARATE PLOTS SAVED - L4 TRBV24-1 highlighted in ORANGE!\n")

```

```{r, fig.height=8, fig.width=10}
library(ggplot2)
library(dplyr)
library(scales)
library(RColorBrewer)

sample_mapping <- c("L1" = "L1", "L2" = "L2", "L3_B" = "L3", "L4_B" = "L4", 
                    "L5" = "L5", "L6" = "L6", "L7" = "L7")
cell_lines <- c("L1", "L2", "L3_B", "L4_B", "L5", "L6", "L7")

percent_data <- vgene_percent %>%
    filter(chain == "TRBV") %>%
    mutate(sample_display = sample_mapping[sample]) %>%
    group_by(sample_display) %>%
    mutate(percent_scRepertoire = nCells / sum(nCells) * 100) %>%
    ungroup()

l4_trbv24 <- percent_data %>% filter(sample_display == "L4", v_gene == "TRBV24-1")
trbv20_data <- percent_data %>% filter(v_gene == "TRBV20-1", sample %in% cell_lines)
minor_data <- percent_data %>%
    filter(sample %in% cell_lines, v_gene != "TRBV20-1", v_gene != "TRBV24-1") %>%
    group_by(sample_display) %>%
    slice_max(percent_scRepertoire, n = 10) %>%
    ungroup()

minor_colors <- colorRampPalette(RColorBrewer::brewer.pal(9, "PuBuGn")[3:9])(length(unique(minor_data$v_gene)))

p_final <- ggplot() +
    geom_col(data = minor_data, 
             aes(x = sample_display, y = percent_scRepertoire/100, fill = v_gene),
             position = "stack", alpha = 0.7, width = 0.85, color = NA) +
    scale_fill_manual(values = minor_colors, guide = "none") +
    geom_col(data = trbv20_data, 
             aes(x = sample_display, y = percent_scRepertoire/100),
             fill = "#E63946", alpha = 0.95, width = 0.85, color = "white", linewidth = 0.5) +
    geom_col(data = l4_trbv24, 
             aes(x = sample_display, y = percent_scRepertoire/100),
             fill = "#F4A261", alpha = 1, width = 0.85, color = "black", linewidth = 0.8) +
    geom_text(data = trbv20_data,
              aes(x = sample_display, y = 0.85, label = sprintf("%.0f%%", percent_scRepertoire)),
              size = 4.5, fontface = "bold", color = "white") +
    geom_text(data = l4_trbv24,
              aes(x = sample_display, y = 0.15, 
                  label = sprintf("TRBV24-1\n0.20%%")),
              hjust = -0.1, size = 4.2, fontface = "bold", color = "black") +
    scale_y_continuous(labels = percent_format(accuracy = 1), expand = expansion(mult = c(0, 0.12))) +
    coord_flip() +
    labs(title = "TRBV Monoclonal Dominance in Sézary Cell Lines",
         subtitle = "TRBV20-1 (99-100%) + L4-specific TRBV24-1 (0.20%)") +
    theme_classic(base_size = 15) +
    theme(axis.text.y = element_text(size = 14, face = "bold"),
          axis.title.y = element_blank(),
          legend.position = "none",
          plot.title = element_text(size = 18, face = "bold", hjust = 0.5),
          plot.subtitle = element_text(size = 14, hjust = 0.5))

print(p_final)

ggsave("Fig_TRBV_monoclonal_final.png", p_final, width = 12, height = 9, dpi = 600, bg = "white")
ggsave("Fig_TRBV_monoclonal_final.pdf", p_final, width = 12, height = 9, bg = "white")

cat("✅ MONOCLONAL title + L4 biclonal highlight!\n")
```





```

