1. Introduction

Single-cell sequencing is an emerging technology in the field of immunology and oncology that allows researchers to couple RNA quantification and other modalities, like immune cell receptor profiling at the level of an individual cell. A number of workflows and software packages have been created to process and analyze single-cell transcriptomic data. These packages allow users to take the vast dimensionality of the data generated in single-cell-based experiments and distill the data into novel insights. Unlike the transcriptomic field, there is a lack of options for software that allow for single-cell immune receptor profiling. Enabling users to easily combine RNA and immune profiling, the scRepertoire framework supports use of 10x, single-cell clonal formats and interaction with popular R-based single-cell data pipelines.

scRepertoire is designed to take filter contig outputs from the 10x Genomics Cell Ranger pipeline, process that data to assign clonotype based on two TCR or Ig chains and analyze the clonotype dynamics. The latter can be separated into 1) clonotype-only analysis functions, such as unique clonotypes or clonal space quantification, and 2) interaction with mRNA expression data using Seurat or SingleCellExperiment packages.

References:

Prerequisite: Ensure the All_samples_Merged Seurat object is loaded into your R environment before running the chunks below.

1.1 Load libraries

library(Seurat)
library(SeuratObject)
library(SeuratObject)
library(SeuratData)
library(patchwork)
library(dplyr)
library(ggplot2)
library(tidyverse)
library(rmarkdown)
library(tinytex)
library(grid)
library(cowplot)
library(presto)

#TCR Analysis
library(scRepertoire)
library(SingleCellExperiment)
library(circlize)
library(scales)

1.2 Load Seurat Object

All_samples_Merged
An object of class Seurat 
62900 features across 49305 samples within 6 assays 
Active assay: RNA (36601 features, 0 variable features)
 2 layers present: data, counts
 5 other assays present: ADT, prediction.score.celltype.l1, prediction.score.celltype.l2, prediction.score.celltype.l3, SCT
 5 dimensional reductions calculated: integrated_dr, ref.umap, pca, umap, harmony

1.3. Load contigs

scRepertoire functions using the filtered_contig_annotations.csv output from the 10x Genomics Cell Ranger. This file is located in the ./outs/ directory of the VDJ alignment folder. To generate a list of contigs to use for scRepertoire:

- load the filtered_contig_annotations.csv for each of the samples.

- make a list in the R environment.

2. Combining Contigs into Clones

There are varying definitions of clones in the literature. For the purposes of scRepertoire, we define a clone as cells with shared/trackable complementarity-determining region 3 (CDR3) sequences. Within this definition, one might use amino acid (aa) sequences of one or both chains to define a clone. Alternatively, we could use nucleotide (nt) or the V(D)JC genes (genes) to define a clone. The latter, genes, would be a more permissive definition of “clones,” as multiple amino acid or nucleotide sequences can result from the same gene combination. Another option to define a clone is the use of the V(D)JC and nucleotide sequence (strict). scRepertoire allows for the use of all these definitions of clones and enables users to select both or individual chains to examine.

2.1 Combining Contigs into Clones

exportClones(combined.TCR,
              write.file = TRUE,
              dir = "TCR_analysis-7-11-2025/",
             file.name = "clones.csv")

exportClones(combined.TCR_with_PBMC,
             write.file = TRUE,
             dir = "TCR_analysis-7-11-2025/",
             file.name = "clones_with_PBMC.csv")

head(combined.TCR[[1]])

head(combined.TCR_with_PBMC[[1]])

# Combine all data frames in the list into a single data frame
combined_TCR_df <- do.call(rbind, combined.TCR)

# Write the combined data frame to a CSV file
write.csv(combined_TCR_df, file = "TCR_analysis-7-11-2025/combined_TCR.csv", row.names = FALSE)


# Combine all data frames in the list into a single data frame
combined_TCR_with_PBMC_df <- do.call(rbind, combined.TCR_with_PBMC)

# Write the combined data frame to a CSV file
write.csv(combined_TCR_with_PBMC_df, file = "TCR_analysis-7-11-2025/combined_TCR_with_PBMC.csv", row.names = FALSE)

2.2 Write the fasta file for phylogeny


# Define the FASTA file for alpha chains
alpha_fasta_file <- "TCR_analysis-7-11-2025/Phylogeny/cdr3_alpha_sequences.fasta"

# Open a connection to the file
f <- file(alpha_fasta_file, open = "w")

# Loop through the data frame
for (i in seq_len(nrow(combined_TCR_with_PBMC_df))) {
  if (!is.na(combined_TCR_with_PBMC_df$cdr3_nt1[i])) {
    # Write the header and sequence
    writeLines(paste0(">", combined_TCR_df$sample[i], "_alpha"), f)
    writeLines(combined_TCR_df$cdr3_nt1[i], f)
  }
}

# Close the connection
close(f)

cat("Alpha chain FASTA file created at:", alpha_fasta_file, "\n")
Alpha chain FASTA file created at: TCR_analysis-7-11-2025/Phylogeny/cdr3_alpha_sequences.fasta 
# Define the FASTA file for beta chains
beta_fasta_file <- "TCR_analysis-7-11-2025/Phylogeny/cdr3_beta_sequences.fasta"

# Open a connection to the file
f <- file(beta_fasta_file, open = "w")

# Loop through the data frame
for (i in seq_len(nrow(combined_TCR_with_PBMC_df))) {
  if (!is.na(combined_TCR_with_PBMC_df$cdr3_nt2[i])) {
    # Write the header and sequence
    writeLines(paste0(">", combined_TCR_with_PBMC_df$sample[i], "_beta"), f)
    writeLines(combined_TCR_df$cdr3_nt2[i], f)
  }
}

# Close the connection
close(f)

cat("Beta chain FASTA file created at:", beta_fasta_file, "\n")
Beta chain FASTA file created at: TCR_analysis-7-11-2025/Phylogeny/cdr3_beta_sequences.fasta 

3. Basic Clonal Visualizations

3.1. clonalQuant: Quantifying Unique Clones

#clonalQuant
 clonalQuant(combined.TCR, 
            cloneCall="strict", 
            chain = "both", 
            scale = TRUE)



clonalQuant(combined.TCR_with_PBMC, 
            cloneCall="strict", 
            chain = "both", 
            scale = TRUE)

NA
NA
NA

3.2 clonalAbundance: Distribution of Clones by Size

#clonalAbundance

clonalAbundance(combined.TCR_with_PBMC, 
                cloneCall = "gene",  palette = "Zissou 1",
                scale = FALSE)

  
clonalAbundance(combined.TCR_with_PBMC, 
                cloneCall = "gene", palette = "Zissou 1",
                scale = TRUE)




clonalAbundance(combined.TCR, 
                cloneCall = "gene",  
                scale = FALSE)

 

clonalAbundance(combined.TCR, 
                cloneCall = "gene", 
                scale = TRUE)

3.3 clonalLength: Distribution of Sequence Lengths

#clonalLength
clonalLength(combined.TCR_with_PBMC, 
             cloneCall="nt", 
             chain = "both") 


clonalLength(combined.TCR_with_PBMC, 
             cloneCall="aa", 
             chain = "both") 


clonalLength(combined.TCR, 
             cloneCall="nt", 
             chain = "both") 


clonalLength(combined.TCR, 
             cloneCall="aa", 
             chain = "both") 




#TRA
clonalLength(combined.TCR_with_PBMC, 
             cloneCall="nt", 
             chain = "TRA", 
             scale = TRUE) 


clonalLength(combined.TCR_with_PBMC, 
             cloneCall="aa", 
             chain = "TRA", 
             scale = TRUE) 


clonalLength(combined.TCR, 
             cloneCall="nt", 
             chain = "TRA",  
             scale = TRUE) 


clonalLength(combined.TCR, 
             cloneCall="aa", 
             chain = "TRA", 
             scale = TRUE) 



#TRB
clonalLength(combined.TCR_with_PBMC, 
             cloneCall="nt", 
             chain = "TRB", 
             scale = TRUE) 


clonalLength(combined.TCR_with_PBMC, 
             cloneCall="aa", 
             chain = "TRB", 
             scale = TRUE) 


clonalLength(combined.TCR, 
             cloneCall="nt", 
             chain = "TRB",  
             scale = TRUE) 


clonalLength(combined.TCR, 
             cloneCall="aa", 
             chain = "TRB", 
             scale = TRUE) 

NA
NA

3.5 clonalCompare: Clonal Dynamics Between Categorical Variables

# clonalCompare
clonalCompare(combined.TCR, 
                  top.clones = 10, 
                  samples = c("L1", "L2"), 
                  cloneCall="aa", 
                  graph = "alluvial")


clonalCompare(combined.TCR, 
                  top.clones = 10, 
                  samples = c("L3", "L4"), 
                  cloneCall="aa", 
                  graph = "alluvial")


clonalCompare(combined.TCR, 
                  top.clones = 10, 
                  samples = c("L5", "L6","L7"), 
                  cloneCall="aa", 
                  graph = "alluvial")


clonalCompare(combined.TCR, 
                  top.clones = 10, 
                  samples = c("L5","L6"), 
                  cloneCall="aa", 
                  graph = "alluvial")


clonalCompare(combined.TCR, 
                  top.clones = 10, 
                  samples = c("L5","L7"), 
                  cloneCall="aa", 
                  graph = "alluvial")


clonalCompare(combined.TCR, 
                  top.clones = 10, 
                  samples = c("L6","L7"), 
                  cloneCall="aa", 
                  graph = "alluvial")


clonalCompare(combined.TCR, 
                  top.clones = 10, 
                  samples = c("L1", "L2", "L3", "L4", "L5", "L6", "L7", "PBMC"), 
                  cloneCall="aa", 
                  graph = "alluvial")



clonalCompare(combined.TCR, 
                  top.clones = 10, 
                  samples = c("L1", "L2", "L3", "L4", "L5", "L6", "L7"), 
                  cloneCall="aa", 
                  graph = "alluvial")

3.6 clonalScatter: Scatterplot of Two Variables

#clonalScatter

clonalScatter(combined.TCR, 
              cloneCall ="gene", 
              x.axis = "L1", 
              y.axis = "L2",
              dot.size = "total",
              graph = "proportion")



clonalScatter(combined.TCR, 
              cloneCall ="gene", 
              x.axis = "L3", 
              y.axis = "L4",
              dot.size = "total",
              graph = "proportion")


clonalScatter(combined.TCR, 
              cloneCall ="gene", 
              x.axis = "L5", 
              y.axis = "L6",
              dot.size = "total",
              graph = "proportion")


clonalScatter(combined.TCR, 
              cloneCall ="gene", 
              x.axis = "L5", 
              y.axis = "L7",
              dot.size = "total",
              graph = "proportion")


clonalScatter(combined.TCR_with_PBMC, 
              cloneCall ="gene", 
              x.axis = "L6", 
              y.axis = "L7",
              dot.size = "total",
              graph = "proportion")

4. Visualizing Clonal Dynamics

4.1 clonalHomeostasis: Examining Clonal Space

# Visualizing Clonal Dynamics

# clonalHomeostasis

clonalHomeostasis(combined.TCR, 
                  cloneCall = "gene")



clonalHomeostasis(combined.TCR, 
                  cloneCall = "gene",
                  cloneSize = c(Rare = 0.001, Small = 0.01, Medium = 0.1, Large = 0.3, Hyperexpanded =1))



clonalHomeostasis(combined.TCR, 
                  cloneCall = "gene",
                  cloneSize = c(Rare = 0.1, Small = 1, Medium = 10, Large = 30, Hyperexpanded =100))

4.2 clonalProportion: Examining Space Occupied by Ranks of Clones

# clonalProportion

clonalProportion(combined.TCR, 
                 cloneCall = "gene") 


clonalProportion(combined.TCR, 
                 cloneCall = "nt",
                 clonalSplit = c(1, 5, 10, 100, 1000, 10000)) 



clonalProportion(combined.TCR, 
                 cloneCall = "nt",
                 clonalSplit = c(10, 100, 1000, 10000, 30000, 100000)) 

NA
NA

5. Summarizing Repertoires

# Summarizing Repertoires

# positionalEntropy

positionalEntropy(combined.TCR, 
                  chain = "TRB", 
                  aa.length = 20)


positionalEntropy(combined.TCR_with_PBMC, 
                  chain = "TRB", 
                  aa.length = 20)




# positionalProperty

positionalProperty(combined.TCR[c(1,2)], 
                  chain = "TRB", 
                  aa.length = 20, 
                  method = "atchleyFactors") + 
  scale_color_manual(values = hcl.colors(5, "inferno")[c(2,4)])



# percentGeneUsage
vizGenes(combined.TCR, 
         x.axis = "TRBV",
         y.axis = NULL,
         plot = "barplot",  
         summary.fun = "proportion")


vizGenes(combined.TCR_with_PBMC, 
         x.axis = "TRBV",
         y.axis = NULL,
         plot = "barplot",  
         summary.fun = "proportion")



vizGenes(combined.TCR, 
        x.axis = "TRAV",
        y.axis = NULL,
        plot = "barplot",  
        summary.fun = "proportion")


vizGenes(combined.TCR_with_PBMC, 
        x.axis = "TRAV",
        y.axis = NULL,
        plot = "barplot",  
        summary.fun = "proportion")


vizGenes(combined.TCR[c(1,2,3,4,5,6,7)], 
         x.axis = "TRBV",
         y.axis = "TRBJ",
         plot = "heatmap",  
          summary.fun = "percent")


vizGenes(combined.TCR[c(1,2,3,4,5,6,7)], 
         x.axis = "TRAV",
         y.axis = "TRAJ",
         plot = "heatmap",  
          summary.fun = "percent")


# For the P17 patient samples, what if we are interested in chain pairings, we can look at TRBV and TRAV at the same time using them as inputs to x.axis and y.axis.
vizGenes(combined.TCR[c(1,2)], 
         x.axis = "TRBV",
         y.axis = "TRAV",
         plot = "heatmap", 
          summary.fun = "percent")


# percentGenes: Quantifying Single Gene Usage
percentGenes(combined.TCR_with_PBMC, 
             chain = "TRB", 
             gene = "Vgene",
              summary.fun = "percent")


percentGenes(combined.TCR_with_PBMC, 
             chain = "TRA", 
             gene = "Vgene",
              summary.fun = "percent")


percentGenes(combined.TCR_with_PBMC, 
             chain = "TRB", 
             gene = "Jgene")


percentGenes(combined.TCR_with_PBMC, 
             chain = "TRA", 
             gene = "Jgene",
              summary.fun = "percent")



# percentVJ
# Quantify the proportion of V and J gene usage with percentVJ(). Like percentGenes(), this function will quantify the percentage of V and J paired together across individual repertoires. The output can be visualized using a heatmap or as input for further dimensional reduction.

percentVJ(combined.TCR_with_PBMC, 
          chain = "TRB",
           summary.fun = "percent")


percentVJ(combined.TCR_with_PBMC, 
          chain = "TRA",
           summary.fun = "percent")


# percentKmer

percentKmer(combined.TCR_with_PBMC, 
            cloneCall = "aa",
            chain = "TRB", 
            motif.length = 3, 
            top.motifs = 25)


percentKmer(combined.TCR_with_PBMC, 
            cloneCall = "aa",
            chain = "TRA", 
            motif.length = 3, 
            top.motifs = 25)


percentKmer(combined.TCR_with_PBMC, 
            cloneCall = "nt",
            chain = "TRB", 
            motif.length = 3, 
            top.motifs = 25)


percentKmer(combined.TCR_with_PBMC, 
            cloneCall = "nt",
            chain = "TRA", 
            motif.length = 3, 
            top.motifs = 25)

6. Comparing Clonal Diversity and Overlap

# clonalDiversity


clonalDiversity(combined.TCR_with_PBMC, 
                cloneCall = "gene")


clonalDiversity(combined.TCR, 
                cloneCall = "gene")


#clonalSizeDistribution


clonalSizeDistribution(combined.TCR_with_PBMC, 
                       cloneCall = "aa", 
                       method= "ward.D2")


clonalSizeDistribution(combined.TCR_with_PBMC, 
                       cloneCall = "nt", 
                       method= "ward.D2")


#clonalOverlap

clonalOverlap(combined.TCR_with_PBMC, 
              cloneCall = "strict", 
              method = "morisita")



clonalOverlap(combined.TCR_with_PBMC, 
              cloneCall = "strict", 
              method = "raw")

7. Combining Clones and Single-Cell Objects


# Combining Clones and Single-Cell Objects

#Getting a sample of a Seurat object
TCR <- get(data("All_samples_Merged"))


#Define color palette 
colorblind_vector <- hcl.colors(n=7, palette = "inferno", fixup = TRUE)


TCR <- combineExpression(combined.TCR_with_PBMC, 
                                   TCR, 
                                   cloneCall="gene", 
                                   group.by = "sample", 
                                   proportion = FALSE, 
                                   cloneSize=c(Single=1, Small=5, Medium=20, Large=100,Hyperexpanded=500))



DimPlot(subset(TCR, !is.na(cloneSize)), 
        group.by = "cloneSize", 
        reduction = "umap")


DimPlot(TCR, group.by = "cloneSize", reduction = "umap") +
    scale_color_manual(values=rev(colorblind_vector[c(1,3,4,5,6)]))
Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 49305, 32343

8. Visualizations for Single-Cell Objects


# Visualizations for Single-Cell Objects
# clonalOverlay

#Adding patient information
scRep_example$Patient_origin <- substr(scRep_example$orig.ident, 1,3)

clonalOverlay(scRep_example, 
              reduction = "umap", 
              cutpoint = 1, 
              bins = 10, 
              facet.by = "Patient_origin") + 
              guides(color = "none")
Error in clonalOverlay(scRep_example, reduction = "umap", cutpoint = 1,  : 
  If filtering the data using a cutpoint, ensure the cut.category correspond to a variable in the meta data.

9. Quantifying Clonal Bias

# # StartracDiversity # From the excellent work by Lei Zhang, et al., the authors introduce new methods for looking at clones by cellular origins and cluster identification. Their STARTRAC software has been adapted to work with scRepertoire and please read and cite their excellent work. # # In order to use the StartracDiversity() function, you will need to include the product of the combinedExpression() function. The second requirement is a column header in the meta data of the Seurat object that has tissue of origin. In the example data, type corresponds to the column “Type”, which includes the “P” and “T” classifiers. The indices can be subsetted for a specific patient or examined overall using the by variable. Importantly, the function uses only the strict definition of a clone of the VDJC genes and the CDR3 nucleotide sequence. # # The indices output includes: # # expa - Clonal Expansion # migr - Cross-tissue Migration # tran - State Transition



StartracDiversity(scRep_example, 
                  type = "Patient_origin",
                  group.by = "cell_line")


StartracDiversity(scRep_example, 
                  type = "cell_line",
                  group.by = "cell_line")

10. clonalBias



clonalBias(scRep_example, 
           cloneCall = "aa", 
           split.by = "cell_line", 
           group.by = "seurat_clusters",
           n.boots = 10, 
           min.expand =5)
---
title: "TCR Analysis-7-11-2025"
author: Nasir Mahmood Abbasi
date: "`r Sys.Date()`"
output:
  # pdf_document: default
  # word_document: default
  # html_document: default
  #rmdformats::readthedown
  html_notebook:
    toc: true
    toc_float: true
    toc_collapsed: true
---


# **1. Introduction**

Single-cell sequencing is an emerging technology in the field of immunology and oncology that allows researchers to couple RNA quantification and other modalities, like immune cell receptor profiling at the level of an individual cell. A number of workflows and software packages have been created to process and analyze single-cell transcriptomic data. These packages allow users to take the vast dimensionality of the data generated in single-cell-based experiments and distill the data into novel insights. Unlike the transcriptomic field, there is a lack of options for software that allow for single-cell immune receptor profiling. Enabling users to easily combine RNA and immune profiling, the scRepertoire framework supports use of 10x, single-cell clonal formats and interaction with popular R-based single-cell data pipelines.

scRepertoire is designed to take filter contig outputs from the 10x Genomics Cell Ranger pipeline, process that data to assign clonotype based on two TCR or Ig chains and analyze the clonotype dynamics. The latter can be separated into 1) clonotype-only analysis functions, such as unique clonotypes or clonal space quantification, and 2) interaction with mRNA expression data using Seurat or SingleCellExperiment packages.

**References:**

- [scRepertoire Vignette](https://www.bioconductor.org/packages/release/bioc/vignettes/scRepertoire/inst/doc/vignette.html)  
- [Borch.dev scRepertoire](https://www.borch.dev/uploads/screpertoire/)



**Prerequisite:** Ensure the `All_samples_Merged` Seurat object is loaded into your R environment before running the chunks below.

## 1.1 Load libraries
```{r}
library(Seurat)
library(SeuratObject)
library(SeuratObject)
library(SeuratData)
library(patchwork)
library(dplyr)
library(ggplot2)
library(tidyverse)
library(rmarkdown)
library(tinytex)
library(grid)
library(cowplot)
library(presto)

#TCR Analysis
library(scRepertoire)
library(SingleCellExperiment)
library(circlize)
library(scales)
```

## 1.2  Load Seurat Object
```{r}

#Load Seurat Object merged from cell lines and a control(PBMC) after filtration
All_samples_Merged <- readRDS("../0-Seurat_RDS_OBJECT_FINAL/All_samples_Merged_with_Renamed_Clusters_final-26-10-2025.rds")

All_samples_Merged
```

## 1.3. Load contigs

scRepertoire functions using the filtered_contig_annotations.csv output from the 10x Genomics Cell Ranger. This file is located in the ./outs/ directory of the VDJ alignment folder. To generate a list of contigs to use for scRepertoire:

**-   load the filtered_contig_annotations.csv for each of the samples.**

**-   make a list in the R environment.**


```{r TCR, include=FALSE}

L1 <- read.csv("/run/user/1000/gvfs/smb-share:server=10.144.142.131,share=commun/Nasir/All_Data_SS/Audrey_Gros/CellRanger/L1/outs/per_sample_outs/L1/vdj_t/filtered_contig_annotations.csv")
L2 <- read.csv("/run/user/1000/gvfs/smb-share:server=10.144.142.131,share=commun/Nasir/All_Data_SS/Audrey_Gros/CellRanger/L2/outs/per_sample_outs/L2/vdj_t/filtered_contig_annotations.csv")
L3 <- read.csv("/run/user/1000/gvfs/smb-share:server=10.144.142.131,share=commun/Nasir/All_Data_SS/Audrey_Gros/CellRanger/L3_CITE_B//outs/per_sample_outs/L3_CITE_B//vdj_t/filtered_contig_annotations.csv")
L4 <- read.csv("/run/user/1000/gvfs/smb-share:server=10.144.142.131,share=commun/Nasir/All_Data_SS/Audrey_Gros/CellRanger/L4_B/outs/per_sample_outs/L4_B//vdj_t/filtered_contig_annotations.csv")
L5 <- read.csv("/run/user/1000/gvfs/smb-share:server=10.144.142.131,share=commun/Nasir/All_Data_SS/Audrey_Gros/CellRanger/L5/outs/per_sample_outs/L5/vdj_t/filtered_contig_annotations.csv")
L6 <- read.csv("/run/user/1000/gvfs/smb-share:server=10.144.142.131,share=commun/Nasir/All_Data_SS/Audrey_Gros/CellRanger/L6_CITE//outs/per_sample_outs/L6_CITE//vdj_t/filtered_contig_annotations.csv")
L7 <- read.csv("/run/user/1000/gvfs/smb-share:server=10.144.142.131,share=commun/Nasir/All_Data_SS/Audrey_Gros/CellRanger/L7/outs/per_sample_outs/L7/vdj_t/filtered_contig_annotations.csv")
PBMC <- read.csv("/run/user/1000/gvfs/smb-share:server=10.144.142.131,share=commun/Nasir/All_Data_SS/Audrey_Gros/CellRanger/PBMC/outs/per_sample_outs/PBMC/vdj_t/filtered_contig_annotations.csv")

```

# **2. Combining Contigs into Clones**

There are varying definitions of clones in the literature. For the purposes of scRepertoire, we define a clone as cells with shared/trackable complementarity-determining region 3 (CDR3) sequences. Within this definition, one might use amino acid (aa) sequences of one or both chains to define a clone. Alternatively, we could use nucleotide (nt) or the V(D)JC genes (genes) to define a clone. The latter, genes, would be a more permissive definition of “clones,” as multiple amino acid or nucleotide sequences can result from the same gene combination. Another option to define a clone is the use of the V(D)JC and nucleotide sequence (strict). scRepertoire allows for the use of all these definitions of clones and enables users to select both or individual chains to examine.

## 2.1 Combining Contigs into Clones
```{r combinedTCR, fig.height=4, fig.width=6}

contig_list <- list(L1, L2, L3, L4,L5,L6, L7)
contig_list_with_PBMC <- list(L1, L2, L3, L4,L5,L6, L7, PBMC)

combined.TCR <- combineTCR(contig_list,
                           samples = c("L1", "L2", "L3", "L4", "L5", "L6", "L7"),
                           removeNA = FALSE,
                           removeMulti = FALSE,
                           filterMulti = FALSE)

combined.TCR_with_PBMC <- combineTCR(contig_list_with_PBMC, 
                           samples = c("L1", "L2", "L3", "L4", "L5", "L6", "L7", 
                                       "PBMC"),
                           removeNA = FALSE, 
                           removeMulti = FALSE, 
                           filterMulti = FALSE)

 exportClones(combined.TCR,
              write.file = TRUE,
              dir = "TCR_analysis-7-11-2025/",
             file.name = "clones.csv")

exportClones(combined.TCR_with_PBMC,
             write.file = TRUE,
             dir = "TCR_analysis-7-11-2025/",
             file.name = "clones_with_PBMC.csv")

head(combined.TCR[[1]])
head(combined.TCR_with_PBMC[[1]])

# Combine all data frames in the list into a single data frame
combined_TCR_df <- do.call(rbind, combined.TCR)

# Write the combined data frame to a CSV file
write.csv(combined_TCR_df, file = "TCR_analysis-7-11-2025/combined_TCR.csv", row.names = FALSE)


# Combine all data frames in the list into a single data frame
combined_TCR_with_PBMC_df <- do.call(rbind, combined.TCR_with_PBMC)

# Write the combined data frame to a CSV file
write.csv(combined_TCR_with_PBMC_df, file = "TCR_analysis-7-11-2025/combined_TCR_with_PBMC.csv", row.names = FALSE)
```

## 2.2 Write the fasta file for phylogeny
```{r phylogeneyFiles, fig.height=4, fig.width=6}

# # Define the FASTA file for alpha chains
# alpha_fasta_file <- "TCR_analysis-7-11-2025/Phylogeny/cdr3_alpha_sequences.fasta"
# 
# # Open a connection to the file
# f <- file(alpha_fasta_file, open = "w")
# 
# # Loop through the data frame
# for (i in seq_len(nrow(combined_TCR_with_PBMC_df))) {
#   if (!is.na(combined_TCR_with_PBMC_df$cdr3_nt1[i])) {
#     # Write the header and sequence
#     writeLines(paste0(">", combined_TCR_df$sample[i], "_alpha"), f)
#     writeLines(combined_TCR_df$cdr3_nt1[i], f)
#   }
# }
# 
# # Close the connection
# close(f)
# 
# cat("Alpha chain FASTA file created at:", alpha_fasta_file, "\n")
# 
# 
# # Define the FASTA file for beta chains
# beta_fasta_file <- "TCR_analysis-7-11-2025/Phylogeny/cdr3_beta_sequences.fasta"
# 
# # Open a connection to the file
# f <- file(beta_fasta_file, open = "w")
# 
# # Loop through the data frame
# for (i in seq_len(nrow(combined_TCR_with_PBMC_df))) {
#   if (!is.na(combined_TCR_with_PBMC_df$cdr3_nt2[i])) {
#     # Write the header and sequence
#     writeLines(paste0(">", combined_TCR_with_PBMC_df$sample[i], "_beta"), f)
#     writeLines(combined_TCR_df$cdr3_nt2[i], f)
#   }
# }
# 
# # Close the connection
# close(f)
# 
# cat("Beta chain FASTA file created at:", beta_fasta_file, "\n")
```



# **3. Basic Clonal Visualizations**

## 3.1. clonalQuant: Quantifying Unique Clones
```{r Clonal-Visualization, fig.height=4, fig.width=8}
#clonalQuant
 clonalQuant(combined.TCR, 
            cloneCall="strict", 
            chain = "both", 
            scale = TRUE)


clonalQuant(combined.TCR_with_PBMC, 
            cloneCall="strict", 
            chain = "both", 
            scale = TRUE)
```

## 3.2 clonalAbundance: Distribution of Clones by Size
```{r V2, fig.height=4, fig.width=8}
#clonalAbundance

clonalAbundance(combined.TCR_with_PBMC, 
                cloneCall = "gene",  palette = "Zissou 1",
                scale = FALSE)
  
clonalAbundance(combined.TCR_with_PBMC, 
                cloneCall = "gene", palette = "Zissou 1",
                scale = TRUE)



clonalAbundance(combined.TCR, 
                cloneCall = "gene",  
                scale = FALSE)
 

clonalAbundance(combined.TCR, 
                cloneCall = "gene", 
                scale = TRUE)
```
## 3.3 clonalLength: Distribution of Sequence Lengths
```{r V3, fig.height=8, fig.width=12}
#clonalLength
clonalLength(combined.TCR_with_PBMC, 
             cloneCall="nt", 
             chain = "both") 

clonalLength(combined.TCR_with_PBMC, 
             cloneCall="aa", 
             chain = "both") 

clonalLength(combined.TCR, 
             cloneCall="nt", 
             chain = "both") 

clonalLength(combined.TCR, 
             cloneCall="aa", 
             chain = "both") 



#TRA
clonalLength(combined.TCR_with_PBMC, 
             cloneCall="nt", 
             chain = "TRA", 
             scale = TRUE) 

clonalLength(combined.TCR_with_PBMC, 
             cloneCall="aa", 
             chain = "TRA", 
             scale = TRUE) 

clonalLength(combined.TCR, 
             cloneCall="nt", 
             chain = "TRA",  
             scale = TRUE) 

clonalLength(combined.TCR, 
             cloneCall="aa", 
             chain = "TRA", 
             scale = TRUE) 


#TRB
clonalLength(combined.TCR_with_PBMC, 
             cloneCall="nt", 
             chain = "TRB", 
             scale = TRUE) 

clonalLength(combined.TCR_with_PBMC, 
             cloneCall="aa", 
             chain = "TRB", 
             scale = TRUE) 

clonalLength(combined.TCR, 
             cloneCall="nt", 
             chain = "TRB",  
             scale = TRUE) 

clonalLength(combined.TCR, 
             cloneCall="aa", 
             chain = "TRB", 
             scale = TRUE) 
```
## 3.5 clonalCompare: Clonal Dynamics Between Categorical Variables
```{r V4, fig.height=8, fig.width=12}
# clonalCompare
clonalCompare(combined.TCR, 
                  top.clones = 10, 
                  samples = c("L1", "L2"), 
                  cloneCall="aa", 
                  graph = "alluvial")

clonalCompare(combined.TCR, 
                  top.clones = 10, 
                  samples = c("L3", "L4"), 
                  cloneCall="aa", 
                  graph = "alluvial")

clonalCompare(combined.TCR, 
                  top.clones = 10, 
                  samples = c("L5", "L6","L7"), 
                  cloneCall="aa", 
                  graph = "alluvial")

clonalCompare(combined.TCR, 
                  top.clones = 10, 
                  samples = c("L5","L6"), 
                  cloneCall="aa", 
                  graph = "alluvial")

clonalCompare(combined.TCR, 
                  top.clones = 10, 
                  samples = c("L5","L7"), 
                  cloneCall="aa", 
                  graph = "alluvial")

clonalCompare(combined.TCR, 
                  top.clones = 10, 
                  samples = c("L6","L7"), 
                  cloneCall="aa", 
                  graph = "alluvial")

clonalCompare(combined.TCR, 
                  top.clones = 10, 
                  samples = c("L1", "L2", "L3", "L4", "L5", "L6", "L7", "PBMC"), 
                  cloneCall="aa", 
                  graph = "alluvial")


clonalCompare(combined.TCR, 
                  top.clones = 10, 
                  samples = c("L1", "L2", "L3", "L4", "L5", "L6", "L7"), 
                  cloneCall="aa", 
                  graph = "alluvial")
```
## 3.6 clonalScatter: Scatterplot of Two Variables
```{r V5, fig.height=4, fig.width=6}
#clonalScatter

clonalScatter(combined.TCR, 
              cloneCall ="gene", 
              x.axis = "L1", 
              y.axis = "L2",
              dot.size = "total",
              graph = "proportion")


clonalScatter(combined.TCR, 
              cloneCall ="gene", 
              x.axis = "L3", 
              y.axis = "L4",
              dot.size = "total",
              graph = "proportion")

clonalScatter(combined.TCR, 
              cloneCall ="gene", 
              x.axis = "L5", 
              y.axis = "L6",
              dot.size = "total",
              graph = "proportion")

clonalScatter(combined.TCR, 
              cloneCall ="gene", 
              x.axis = "L5", 
              y.axis = "L7",
              dot.size = "total",
              graph = "proportion")

clonalScatter(combined.TCR_with_PBMC, 
              cloneCall ="gene", 
              x.axis = "L6", 
              y.axis = "L7",
              dot.size = "total",
              graph = "proportion")
```
# **4. Visualizing Clonal Dynamics**

## 4.1 clonalHomeostasis: Examining Clonal Space
```{r V6, fig.height=4, fig.width=6}
# Visualizing Clonal Dynamics

# clonalHomeostasis

clonalHomeostasis(combined.TCR, 
                  cloneCall = "gene")


clonalHomeostasis(combined.TCR, 
                  cloneCall = "gene",
                  cloneSize = c(Rare = 0.001, Small = 0.01, Medium = 0.1, Large = 0.3, Hyperexpanded =1))


clonalHomeostasis(combined.TCR, 
                  cloneCall = "gene",
                  cloneSize = c(Rare = 0.1, Small = 1, Medium = 10, Large = 30, Hyperexpanded =100))

```
## 4.2 clonalProportion: Examining Space Occupied by Ranks of Clones
```{r Stats, fig.height=8, fig.width=21}
# clonalProportion

clonalProportion(combined.TCR, 
                 cloneCall = "gene") 

clonalProportion(combined.TCR, 
                 cloneCall = "nt",
                 clonalSplit = c(1, 5, 10, 100, 1000, 10000)) 


clonalProportion(combined.TCR, 
                 cloneCall = "nt",
                 clonalSplit = c(10, 100, 1000, 10000, 30000, 100000)) 


```
# **5. Summarizing Repertoires**
```{r DE, fig.height=18, fig.width=22}
# Summarizing Repertoires

# positionalEntropy

positionalEntropy(combined.TCR, 
                  chain = "TRB", 
                  aa.length = 20)

positionalEntropy(combined.TCR_with_PBMC, 
                  chain = "TRB", 
                  aa.length = 20)



# positionalProperty

positionalProperty(combined.TCR[c(1,2)], 
                  chain = "TRB", 
                  aa.length = 20, 
                  method = "atchleyFactors") + 
  scale_color_manual(values = hcl.colors(5, "inferno")[c(2,4)])


# percentGeneUsage
vizGenes(combined.TCR, 
         x.axis = "TRBV",
         y.axis = NULL,
         plot = "barplot",  
         summary.fun = "proportion")

vizGenes(combined.TCR_with_PBMC, 
         x.axis = "TRBV",
         y.axis = NULL,
         plot = "barplot",  
         summary.fun = "proportion")


vizGenes(combined.TCR, 
        x.axis = "TRAV",
        y.axis = NULL,
        plot = "barplot",  
        summary.fun = "proportion")

vizGenes(combined.TCR_with_PBMC, 
        x.axis = "TRAV",
        y.axis = NULL,
        plot = "barplot",  
        summary.fun = "proportion")

vizGenes(combined.TCR[c(1,2,3,4,5,6,7)], 
         x.axis = "TRBV",
         y.axis = "TRBJ",
         plot = "heatmap",  
          summary.fun = "percent")

vizGenes(combined.TCR[c(1,2,3,4,5,6,7)], 
         x.axis = "TRAV",
         y.axis = "TRAJ",
         plot = "heatmap",  
          summary.fun = "percent")

# For the P17 patient samples, what if we are interested in chain pairings, we can look at TRBV and TRAV at the same time using them as inputs to x.axis and y.axis.
vizGenes(combined.TCR[c(1,2)], 
         x.axis = "TRBV",
         y.axis = "TRAV",
         plot = "heatmap", 
          summary.fun = "percent")

# percentGenes: Quantifying Single Gene Usage
percentGenes(combined.TCR_with_PBMC, 
             chain = "TRB", 
             gene = "Vgene",
              summary.fun = "percent")

percentGenes(combined.TCR_with_PBMC, 
             chain = "TRA", 
             gene = "Vgene",
              summary.fun = "percent")

percentGenes(combined.TCR_with_PBMC, 
             chain = "TRB", 
             gene = "Jgene")

percentGenes(combined.TCR_with_PBMC, 
             chain = "TRA", 
             gene = "Jgene",
              summary.fun = "percent")


# percentVJ
# Quantify the proportion of V and J gene usage with percentVJ(). Like percentGenes(), this function will quantify the percentage of V and J paired together across individual repertoires. The output can be visualized using a heatmap or as input for further dimensional reduction.

percentVJ(combined.TCR_with_PBMC, 
          chain = "TRB",
           summary.fun = "percent")

percentVJ(combined.TCR_with_PBMC, 
          chain = "TRA",
           summary.fun = "percent")

# percentKmer

percentKmer(combined.TCR_with_PBMC, 
            cloneCall = "aa",
            chain = "TRB", 
            motif.length = 3, 
            top.motifs = 25)

percentKmer(combined.TCR_with_PBMC, 
            cloneCall = "aa",
            chain = "TRA", 
            motif.length = 3, 
            top.motifs = 25)

percentKmer(combined.TCR_with_PBMC, 
            cloneCall = "nt",
            chain = "TRB", 
            motif.length = 3, 
            top.motifs = 25)

percentKmer(combined.TCR_with_PBMC, 
            cloneCall = "nt",
            chain = "TRA", 
            motif.length = 3, 
            top.motifs = 25)
```
# **6. Comparing Clonal Diversity and Overlap**
```{r clonalDiversity, fig.height=4, fig.width=6}
# clonalDiversity


clonalDiversity(combined.TCR_with_PBMC, 
                cloneCall = "gene")

clonalDiversity(combined.TCR, 
                cloneCall = "gene")

#clonalSizeDistribution


clonalSizeDistribution(combined.TCR_with_PBMC, 
                       cloneCall = "aa", 
                       method= "ward.D2")

clonalSizeDistribution(combined.TCR_with_PBMC, 
                       cloneCall = "nt", 
                       method= "ward.D2")

#clonalOverlap

clonalOverlap(combined.TCR_with_PBMC, 
              cloneCall = "strict", 
              method = "morisita")


clonalOverlap(combined.TCR_with_PBMC, 
              cloneCall = "strict", 
              method = "raw")
```

# **7. Combining Clones and Single-Cell Objects**
```{r SC, fig.height=8, fig.width=12}

# Combining Clones and Single-Cell Objects

#Getting a sample of a Seurat object
TCR <- get(data("All_samples_Merged"))


#Define color palette 
colorblind_vector <- hcl.colors(n=7, palette = "inferno", fixup = TRUE)


TCR <- combineExpression(combined.TCR_with_PBMC, 
                                   TCR, 
                                   cloneCall="gene", 
                                   group.by = "sample", 
                                   proportion = FALSE, 
                                   cloneSize=c(Single=1, Small=5, Medium=20, Large=100,Hyperexpanded=500))



DimPlot(subset(TCR, !is.na(cloneSize)), 
        group.by = "cloneSize", 
        reduction = "umap")

# DimPlot(TCR, group.by = "cloneSize", reduction = "umap") +
#     scale_color_manual(values=rev(colorblind_vector[c(1,3,4,5,6)]))
# 
# TCR$CTgene
# 
# 
# #Define color palette 
# colorblind_vector <- hcl.colors(n=9, palette = "inferno", fixup = TRUE)
# 
# Seurat::DimPlot(TCR, group.by = "cloneSize", reduction = "umap") +
#     scale_color_manual(values=rev(colorblind_vector[c(1,3,4,5,7)]))
```
# **8. Visualizations for Single-Cell Objects**
```{r V_sc, fig.height=14, fig.width=18}

# Visualizations for Single-Cell Objects
# clonalOverlay

#Adding patient information
scRep_example$Patient_origin <- substr(scRep_example$orig.ident, 1,3)

clonalOverlay(scRep_example, 
              reduction = "umap", 
              cutpoint = 1, 
              bins = 10, 
              facet.by = "Patient_origin") + 
              guides(color = "none")



clonalOverlay(scRep_example, 
              reduction = "umap", 
              cutpoint = 1, 
              bins = 10, 
              facet.by = "cell_line") + 
              guides(color = "none")

#clonalNetwork
#ggraph needs to be loaded due to issues with ggplot
library(ggraph)

clonalNetwork(scRep_example, 
              reduction = "umap", 
              group.by = "seurat_clusters",
              filter.clones = NULL,
              filter.identity = NULL,
              cloneCall = "aa")


#Examining Cluster 3 only
clonalNetwork(scRep_example, 
              reduction = "umap", 
              group.by = "seurat_clusters",
              filter.identity = 8,
              cloneCall = "aa")


shared.clones <- clonalNetwork(scRep_example, 
                               reduction = "umap", 
                               group.by = "seurat_clusters",
                               cloneCall = "aa", 
                               exportClones = TRUE)
head(shared.clones)


scRep_example <- highlightClones(scRep_example, 
                    cloneCall= "aa", 
                    sequence = c("CATGPNGSSNTGKLIF;CALSNNARLMF_CSATTGFYGYTF", 
                                 "CAYSESGGSNYKLTF_CSALAGGYTDTQYF",
                                 "CLVGETGRRALTF_CSARGDRGQPQHF",
                                 "CLVGETGRRALTF_CSARGDRGQPQHF;CATSDFKQVSNQPQHF",
                                 "CATGPNGSSNTGKLIF_CSATTGFYGYTF",
                                 "NA_CSARGDRGQPQHF"))

Seurat::DimPlot(scRep_example, group.by = "highlight", reduction = "umap") + 
  ggplot2::theme(plot.title = element_blank())

#clonalOccupy
clonalOccupy(scRep_example, 
              x.axis = "seurat_clusters")

clonalOccupy(scRep_example, 
                     x.axis = "ident", 
                     proportion = TRUE, 
                     label = FALSE)
# alluvialClones

# scRep_example$Patient_origin <- substr(scRep_example$orig.ident, 8,8)
# 
# alluvialClones(scRep_example, 
#                cloneCall = "aa", 
#                y.axes = c("Patient_origin", "orig.ident", "Cell_line_Immunophenotype"), 
#                color = c("CATGPNGSSNTGKLIF;CALSNNARLMF_CSATTGFYGYTF", 
#                                  "CAYSESGGSNYKLTF_CSALAGGYTDTQYF",
#                                  "CLVGETGRRALTF_CSARGDRGQPQHF",
#                                  "CLVGETGRRALTF_CSARGDRGQPQHF;CATSDFKQVSNQPQHF",
#                                  "CATGPNGSSNTGKLIF_CSATTGFYGYTF",
#                                  "NA_CSARGDRGQPQHF")) + 
#     scale_fill_manual(values = c("grey", colorblind_vector[3]))
# 
# 
# 
# alluvialClones(scRep_example, 
#                    cloneCall = "gene", 
#                    y.axes = c("Patient_origin", "orig.ident", "Cell_line_Immunophenotype"), 
#                    color = "ident") 



library(circlize)
library(scales)

circles <- getCirclize(scRep_example, 
                       group.by = "seurat_clusters")

#Just assigning the normal colors to each cluster
grid.cols <- hue_pal()(length(unique(scRep_example$seurat_clusters)))
names(grid.cols) <- unique(scRep_example$seurat_clusters)

#Graphing the chord diagram
chordDiagram(circles, self.link = 1, grid.col = grid.cols)


circles <- getCirclize(scRep_example, group.by = "cell_line")

grid.cols <- scales::hue_pal()(length(unique(scRep_example@active.ident)))
names(grid.cols) <- levels(scRep_example@active.ident)

chordDiagram(circles, 
             self.link = 1, 
             grid.col = grid.cols)

```



# **9. Quantifying Clonal Bias**

**# # StartracDiversity
# From the excellent work by Lei Zhang, et al., the authors introduce new methods for looking at clones by cellular origins and cluster identification. Their STARTRAC software has been adapted to work with scRepertoire and please read and cite their excellent work.
# 
# In order to use the StartracDiversity() function, you will need to include the product of the combinedExpression() function. The second requirement is a column header in the meta data of the Seurat object that has tissue of origin. In the example data, type corresponds to the column “Type”, which includes the “P” and “T” classifiers. The indices can be subsetted for a specific patient or examined overall using the by variable. Importantly, the function uses only the strict definition of a clone of the VDJC genes and the CDR3 nucleotide sequence.
# 
# The indices output includes:
# 
# expa - Clonal Expansion
# migr - Cross-tissue Migration
# tran - State Transition
**


```{r Quatify, fig.height=8, fig.width=12}


StartracDiversity(scRep_example, 
                  type = "Patient_origin",
                  group.by = "cell_line")


StartracDiversity(scRep_example, 
                  type = "cell_line",
                  group.by = "cell_line")


```
# **10. clonalBias**
```{r Quatify2, fig.height=8, fig.width=12}


clonalBias(scRep_example, 
           cloneCall = "aa", 
           split.by = "cell_line", 
           group.by = "seurat_clusters",
           n.boots = 10, 
           min.expand =5)

```