2. Load Seurat Object
All_samples_Merged <- readRDS("../../0-Seurat_RDS_OBJECT_FINAL/All_samples_Merged_Harmony_integrated_Cell_line_renamed_03-07-2025.rds")
QC
allsamplesquality <- SCassist_analyze_quality("All_samples_Merged", percent_mt = "percent.mt", percent_ribo = "percent.rb", api_key_file = api_key_file, llm_server="google")
Based on the provided data summary, below are my recommendations for the quality filtering of the single-cell RNA sequencing data. These recommendations aim to balance sensitivity (avoiding removal of good cells) with specificity (removing low-quality cells). The researcher should test a range of values around these recommendations to optimize filtering for their specific dataset and downstream analysis.
**nCount_RNA:**
* **Lower Cutoff:** 3500. This value is slightly below the 5th percentile (3690.2) to account for potential low-expressing cells that might still be biologically relevant, while still removing very low-quality cells.
* **Upper Cutoff:** 30000. This value is chosen to be below the 95th percentile (27505) to remove high count outliers, which may represent cell doublets or other artifacts. The value is set slightly higher to allow for some variability in highly expressing cells.
**nFeature_RNA:**
* **Lower Cutoff:** 1500. This value is set at the 5th percentile (1595) to remove cells with very few detected genes.
* **Upper Cutoff:** 6000. This value is slightly below the 95th percentile (5532) to remove cells with an unusually high number of detected genes, which may indicate doublets or other artifacts.
**percent.mt:**
* **Upper Cutoff:** 3.5%. This value is chosen to be below the 95th percentile (4.31%) to remove cells with high mitochondrial gene expression, a common indicator of poor-quality cells. The value is set slightly lower to be more stringent.
**percent.rb:**
* **Upper Cutoff:** 40%. This value is chosen to be below the 95th percentile (43.76%) to remove cells with high ribosomal gene expression, another indicator of poor-quality cells. The value is set slightly lower to be more stringent.
Normalizarion
# Ask SCassist to recommend normalization method
normalization_recommendation<-SCassist_recommend_normalization("All_samples_Merged", llm_server="google", api_key_file = api_key_file)
Given the characteristics of the single-cell RNA-seq dataset:
* **Large number of cells (49305):** This allows for robust statistical analysis.
* **High standard deviation (11.71) relative to the mean (3.83):** This indicates a high degree of variability in gene expression across cells, typical of scRNA-seq data. The high coefficient of variation (CV) further supports this.
* **Coefficient of variation of library sizes (0.56):** This suggests moderate variability in sequencing depth across cells.
Considering these factors, **SCTransform** is the most appropriate normalization method from the Seurat options provided (LogNormalize, CLR, RC, SCTransform).
**Reasons for choosing SCTransform:**
* **Handles library size variation effectively:** SCTransform uses a regularized negative binomial model to account for both technical and biological variability in gene expression. This is crucial given the moderate library size variation (CV = 0.56). Simple methods like LogNormalize only address library size differences and don't explicitly model the technical noise inherent in scRNA-seq.
* **Accounts for cell-specific biases:** The model in SCTransform implicitly corrects for various technical biases, including those related to sequencing depth and gene length, leading to more accurate normalization.
* **Improved downstream analysis:** The resulting normalized data is often better suited for downstream analyses like dimensionality reduction (PCA, UMAP) and clustering, as it reduces the impact of technical noise and highlights biological differences between cells.
**Why other methods are less suitable:**
* **LogNormalize:** While simple and fast, it only corrects for library size differences. It doesn't account for other technical biases and might not be optimal for datasets with high variability like this one.
* **CLR (Center Log Ratio):** CLR is suitable for compositional data where the sum of values is constant. While it can handle some technical variation, it's less robust than SCTransform for scRNA-seq data with varying library sizes and complex technical noise.
* **RC (Regularized Count):** Similar to CLR, RC is less comprehensive in handling the technical complexities of scRNA-seq data compared to SCTransform.
**Potential Alternatives (though not listed):**
* **sctransform (from the sctransform package):** This is very similar to Seurat's SCTransform and is a strong alternative. The choice between the two often comes down to personal preference and integration with other analysis pipelines.
* **scran (from the scran package):** scran offers a different approach to normalization, often considered very robust. It uses a sophisticated model to account for various sources of variation. However, it requires a bit more manual parameter tuning.
In summary, while other methods might offer acceptable results, SCTransform provides the most comprehensive and robust normalization for this dataset, effectively addressing the observed library size variation and the high variability in gene expression, leading to improved downstream analysis. The large number of cells further supports the use of a more sophisticated method like SCTransform.
Variable features

Analyze Variable Features
# Experimental design statement
experimental_design = "We have 7 cell lines malignant CD4 T cells and 2 healthy patients CD4 T cells normal from them"
# Ask SCassist to analyze variable features
variable_feature_analysis <- SCassist_analyze_variable_features("All_samples_Merged", experimental_design = experimental_design, api_key_file = api_key_file, llm_server="google")
The provided gene list strongly suggests enrichment in immune response pathways, specifically those related to **T cell activation, cytotoxicity, and inflammation**.
Several genes point to **T cell activation and proliferation**: `IL7R` (interleukin-7 receptor, crucial for T cell development and survival), `CD7` (T cell surface marker), `TRBV7-2`, `TRBV5-1`, `TRBV7-9`, and `TRBV3-1` (T cell receptor beta variable genes, indicating T cell receptor diversity and potentially clonal expansion). `FOS` is a proto-oncogene involved in cell growth and differentiation, often activated in immune responses. `CSF2` (granulocyte-macrophage colony-stimulating factor) also contributes to immune cell proliferation.
Many genes are associated with **cytotoxicity**: `GZMB` and `GZMA` (granzymes, serine proteases mediating apoptosis in target cells), `GNLY` (granulysin, another cytotoxic molecule), and `CCL3`, `CCL4`, `CCL1`, `CCL17`, `XCL1` (chemokines involved in recruiting cytotoxic cells to sites of inflammation).
The presence of multiple **HLA genes** (`HLA-DRB1`, `HLA-DQA1`, `HLA-DRA`) indicates major histocompatibility complex involvement, crucial for antigen presentation and T cell recognition. `CD74` is an MHC class II-associated invariant chain, further supporting this.
**Inflammation** is highlighted by chemokines (`CCL3`, `CCL4`, `CCL1`, `CCL17`, `XCL1`, `CXCL10`) and other inflammatory mediators like `S100A8` and `SERPINE1`. `OASL`, `IFIT2`, and `MT2A` suggest an interferon response, a key component of the innate immune response often triggered during inflammation.
The presence of `RPS4Y1` (ribosomal protein on the Y chromosome) might reflect the male sex of some samples, but is not a major pathway contributor. `KRT1` (keratin) and `PPBP` (prolactin-inducible protein) have less direct roles in the core immune processes described above. `MALAT1` is a long non-coding RNA, and its role in this context would require further investigation.
**Relevance to the experimental design:** The enriched pathways are highly relevant to comparing malignant CD4 T cells with healthy CD4 T cells. The differences in expression of these genes likely reflect the altered activation state, proliferative capacity, and cytotoxic potential of the malignant cells compared to their healthy counterparts. The observed variations in chemokines and HLA genes could indicate differences in their ability to interact with other immune cells and respond to antigens.
PCA
# Print genes in the top 5 pcs
print(allsamplesgood[["pca"]], dims = 1:5, nfeatures = 5)
PC_ 1
Positive: NPM1, SEC11C, YBX3, VDAC1, MTHFD2
Negative: CD7, PRKCH, KIR3DL1, SEPTIN9, PTPRC
PC_ 2
Positive: PAGE5, RPL35A, RBPMS, CD74, TENM3
Negative: RPS17, CYBA, C12orf75, LY6E, SCCPDH
PC_ 3
Positive: RPL30, RPL39, RPS27, RPS4Y1, ETS1
Negative: PFN1, KIR3DL2, RPS15, NME2, ATP5F1D
PC_ 4
Positive: HSPE1, EIF5A, RPL34, RPS4Y1, MT-ND3
Negative: RPS4X, GAS5, KRT1, EGLN3, LINC02752
PC_ 5
Positive: TMSB4X, LGALS1, TMSB10, S100A4, S100A11
Negative: CCL17, MIR155HG, MAP4K4, LRBA, PRKCA
Choosing appropriate number of PCs
pc_recommendation=SCassist_recommend_pcs("allsamplesgood", experimental_design = experimental_design, llm_server="google", api_key_file = api_key_file)
5
The scree plot (variance explained vs. PC number) would show a relatively steep drop in variance explained from PC1 to PC5, followed by a more gradual decline. While there isn't a sharply defined "elbow," the cumulative variance explained by the first 5 PCs (17% + 13.16% + 11.97% + 6.46% + 5.57% = 54.16%) represents a substantial portion of the total variance. Including more PCs would yield diminishing returns in terms of variance explained, significantly increasing the complexity of downstream analyses without a commensurate gain in information. Therefore, 5 PCs offer a good balance between capturing a significant amount of variance and maintaining manageable complexity.
Choosing appropriate number of PCs
# Identify number of PCs that explains majority of variations
ElbowPlot(allsamplesgood, ndims=50, reduction = "pca")

Choosing appropriate number of PCs
# Visualize the genes in the first PC
VizDimLoadings(allsamplesgood, dims = 1, ncol = 1) + theme_minimal(base_size = 8)

Choosing appropriate number of PCs
# Visualize the genes in the second PC
VizDimLoadings(allsamplesgood, dims = 2, ncol = 1) + theme_minimal(base_size = 8)

Choosing appropriate number of PCs
# Plot heatmap with cells=500 plotting cells with extreme cells on both ends of spectrum
DimHeatmap(object = allsamplesgood, dims = 1:15, cells = 500, balanced = TRUE)

SCassist analyze PCs
pc_analyzed=SCassist_analyze_pcs("allsamplesgood", num_pcs = 5, experimental_design = experimental_design, llm_server="google", api_key_file = api_key_file)
**PC1 Summary:**
PC1's top contributing genes (NPM1, SEC11C, CD7, PRKCH, YBX3, etc.) reveal a strong signature related to T cell activation and proliferation, alongside markers of cell structure and metabolism. Many genes (CD7, CD3G, CD6, CD52, LCK, KIR family members) are known T cell surface markers, while others (IL2RA, CCND2, GZMM) are involved in T cell signaling and effector functions (e.g., cytokine signaling and granule release). The presence of genes associated with cell growth and metabolism (NPM1, MTHFD2, CCT8) suggests a high metabolic activity characteristic of proliferating cells. Therefore, the primary biological process driving the variation captured by PC1 is likely the differential activation and proliferation status of the CD4+ T cells, with malignant cells exhibiting higher levels of activation and proliferation compared to healthy controls.
**PC2 Summary:**
The top contributing genes to PC2 include several ribosomal proteins (RPL35A, RPS17, RPL22L1, RPS3A, RPL11, RPL27A), indicating strong influence of translational machinery. Other prominent genes are involved in mitochondrial function (NDUFV2, ATP5MC1), immune response (CD74, TNFRSF4, TIGIT, STAT1), and stress response (GPX4, PARK7). The presence of genes like CDKN2A (cell cycle regulator) and several chaperones (e.g., HSP90 family member) suggests potential cellular stress and altered cell cycle regulation. Therefore, PC2 likely captures variations driven by differences in protein synthesis, mitochondrial activity, immune cell activation, and cellular stress responses, potentially reflecting the malignant nature of the CD4+ T cell lines compared to the healthy controls. The observed variations could be a consequence of the malignant transformation process itself or reflect differences in culture conditions between the cell lines and primary cells.
**PC3 Summary:**
PC3's top contributing genes (RPL30, PFN1, RPL39, RPS27, RPS15, etc.) are predominantly ribosomal proteins (RPL, RPS) and genes involved in mitochondrial function (ATP5F1D, MT-ND3, MT-CO2, NDUFA4, NDUFS6, COX6A1). Other notable genes include those related to cytoskeletal structure (PFN1, ACTB), translation (EIF4A1, EIF5A), and immune function (KIR3DL2, KIR2DL3, TIGIT, SELL). The strong representation of ribosomal and mitochondrial genes suggests that PC3 primarily captures variations in cellular metabolism and protein synthesis. The inclusion of immune-related genes, particularly KIR family members and TIGIT, hints at a potential confounding effect of immune cell activation or differentiation state, which could be influencing the observed differences between the malignant and healthy CD4 T cell populations. Therefore, PC3 likely reflects a combination of differences in cellular metabolic activity and immune cell characteristics.
**PC4 Summary:**
PC4's top contributing genes include several ribosomal proteins (RPS4X, RPLP1, RPL13, RPLP0, RPL34), indicating potential differences in translational activity. The presence of genes associated with inflammation and immune response (IL32, IFNGR1, ICAM3, IL4, NKG7, SOCS1) suggests a strong immune component. Furthermore, genes linked to cell growth and proliferation (HMGA2, CCNI2, CEBPD) and stress response (HSPE1, HSPB1) are also prominent. Therefore, PC4 likely captures variations driven by differences in translational capacity, immune activation state, and cellular proliferation/stress response, potentially reflecting the malignant nature of the CD4+ T cell lines compared to the healthy controls. The presence of keratin genes (KRT1) might indicate differences in cell morphology or differentiation.
**PC5 Summary:**
The top contributing genes to PC5 include several markers strongly associated with inflammation and malignancy in T cells. Genes like *CCL17* (chemokine attracting Th2 cells), *S100A4*, *S100A6*, *S100A11* (calcium-binding proteins implicated in inflammation and cancer), *MIR155HG* (oncomir), and *LTA* (tumor necrosis factor ligand) suggest a strong inflammatory and potentially oncogenic signature. The presence of genes involved in cell growth and proliferation (*RUNX1*, *PRKCA*) further supports this. Therefore, PC5 likely captures variations driven by the differential expression of genes associated with inflammation, cell proliferation, and potentially malignant transformation, distinguishing the malignant CD4+ T cell lines from the healthy controls. The enrichment of these genes in the malignant samples suggests that this PC is separating the malignant and healthy cell populations based on their distinct inflammatory and proliferative states.
The five principal components (PCs) reveal distinct aspects of the differences between malignant and healthy CD4+ T cells. PC1 primarily reflects differences in T cell activation and proliferation, with malignant cells showing higher levels. PC2 highlights variations in protein synthesis, mitochondrial activity, immune response, and cellular stress, potentially reflecting the consequences of malignant transformation or differing culture conditions. PC3 focuses on cellular metabolism and protein synthesis, with a potential confounding influence from immune cell activation. PC4 captures variations in translational capacity, immune activation, cellular proliferation, and stress response, again potentially linked to malignancy. Finally, PC5 strongly emphasizes an inflammatory and potentially oncogenic signature, clearly distinguishing malignant from healthy cells based on the expression of inflammation-related genes and cell proliferation markers. In summary, the PCs collectively illustrate a multifaceted picture of malignant transformation in CD4+ T cells, encompassing changes in cell activation, metabolism, protein synthesis, immune response, and inflammatory signaling. The strong representation of ribosomal proteins across multiple PCs suggests a significant alteration in translational machinery in the malignant cells.
FindNeighbors
# Run SCassist_recommend_k
recommended_k <- SCassist_recommend_k("allsamplesgood", num_pcs = 10, llm_server="google", api_key_file = api_key_file)
Recommended K: 15-50
Reasoning:
The optimal `k.param` value for `FindNeighbors()` in Seurat depends on the dataset's complexity and the desired granularity of clustering. With approximately 49,305 cells and 10 principal components capturing the major variance, a relatively high `k` value is likely needed to capture the neighborhood relationships between cells belonging to different, yet potentially closely related, cell populations. A lower `k` might lead to overly fragmented clusters, while an excessively high `k` could blur the boundaries between distinct cell types.
Therefore, exploring a range from 15 to 50 allows for a balance between capturing local neighborhood structure and avoiding overly broad clusters. Starting at 15 ensures consideration of a reasonable number of nearest neighbors, while extending to 50 allows for the identification of more distant, yet potentially biologically relevant, relationships between cells. Visual inspection of the resulting clustering at different `k` values, along with biological validation, will be crucial for selecting the most appropriate value.
Clustering
allsamplesgood <- FindNeighbors(allsamplesgood, dims = 1:15, k.param = 30, return.neighbor = TRUE)
Computing nearest neighbors
Only one graph name supplied, storing nearest-neighbor graph only
# Run SCassist_recommend_res
recommended_res <- SCassist_recommend_res("allsamplesgood", llm_server="google", api_key_file = api_key_file)
Warning: sparse->dense coercion: allocating vector of size 13.4 GiB
Based on the data characteristics, I recommend;
Recommended Resolution, EXAMPLE: seq(0.2, 1.2, 0.2),
Reasoning:
The mean expression variability (5.53) and median neighbor distance (5.50) are relatively high, suggesting a complex and potentially heterogeneous cell population. A narrow resolution range is likely insufficient to capture the subtle differences between cell subpopulations. Starting at a lower resolution (e.g., 0.2) allows for the identification of broader cell clusters, while gradually increasing the resolution (e.g., increments of 0.2) allows for finer-grained separation of subpopulations. A resolution of 1.2 might be a reasonable upper limit, as going much higher risks over-splitting the data into overly granular and potentially biologically meaningless clusters. The suggested range allows for exploration of different clustering granularities to find the optimal balance between distinct and biologically relevant clusters. Experimentation within this range is crucial to determine the most appropriate resolution for the specific dataset and biological question.
Find all markers
markersall=FindAllMarkers(allsamplesgood)
Calculating cluster 0
Calculating cluster 1
Calculating cluster 2
Calculating cluster 3
Calculating cluster 4
Calculating cluster 5
Annotation with SCassist
Idents(allsamplesgood) <- "seurat_clusters"
# Run SCassist_analyze_and_annotate
sca_annotation <- SCassist_analyze_and_annotate(markersall, top_genes = 50, seurat_object_name = "allsamplesgood", api_key_file = api_key_file, llm_server="google")
[1] "The output of Analyze and Annotate is saved as a tab delimited text if your current working directory. If you provided the name of the seurat object, the annotation is also added to that object in the SCassist_annotation column"
[1] "SCassistant is analyzing markers to predict cell types for cluster : 0"
[1] "SCassistant is analyzing markers to predict cell types for cluster : 1"
[1] "SCassistant is analyzing markers to predict cell types for cluster : 2"
[1] "SCassistant is analyzing markers to predict cell types for cluster : 3"
[1] "SCassistant is analyzing markers to predict cell types for cluster : 4"
[1] "SCassistant is analyzing markers to predict cell types for cluster : 5"
[1] "SCassistant is analyzing markers to predict cell types for cluster : 6"
[1] "SCassistant is analyzing markers to predict cell types for cluster : 7"
[1] "SCassistant is analyzing markers to predict cell types for cluster : 8"
[1] "SCassistant is analyzing markers to predict cell types for cluster : 9"
[1] "SCassistant is analyzing markers to predict cell types for cluster : 10"
[1] "SCassistant is analyzing markers to predict cell types for cluster : 11"
[1] "SCassistant is analyzing markers to predict cell types for cluster : 12"
[1] "SCassistant is analyzing markers to predict cell types for cluster : 13"
Pathway Enrichment Analysis
Idents(allsamplesgood) <- "cell_line"
markersingroup <- FindMarkers(allsamplesgood, ident.1 = "L1", ident.2 = "L2", group.by = 'cell_line')
scassist_pathway_summary <- SCassist_analyze_enrichment(markers=markersingroup,experimental_design = experimental_design, pvalue=0.05, log2FC=1, api_key_file=api_key_file, llm_server = "google")
The KEGG analysis is presented below.
-------------------------------------------
The ClusterProfiler KEGG pathways result is saved as a text file in the current working directory.
The list of significantly enriched KEGG pathways is also saved as a text file in the current working directory.
KEGG Enrichment Insights:
**1. Significant Pathways:**
The KEGG pathway enrichment analysis reveals significant dysregulation in pathways associated with cancer development and progression in malignant CD4 T cells compared to healthy controls. Multiple cancer-related pathways are enriched, including bladder cancer, non-small cell lung cancer, pancreatic cancer, glioma, and basal cell carcinoma. These findings suggest a strong oncogenic signature in the malignant CD4 T cell lines.
Furthermore, pathways related to cellular metabolism (e.g., oxidative phosphorylation, fatty acid metabolism, and the TCA cycle) show significant enrichment, indicating alterations in energy production and metabolic processes in malignant cells. Immune response pathways (e.g., T cell receptor signaling, Th1/Th2 differentiation, and natural killer cell mediated cytotoxicity) are also affected, reflecting the malignant transformation's impact on immune function. The enrichment of apoptosis pathways suggests potential mechanisms of immune evasion or cell survival in the malignant cells.
**2. Regulators:**
Several transcription factors are present in the provided gene lists, potentially playing a role in the observed pathway enrichments. For example, GATA3 is involved in T cell differentiation and is found in the "Parathyroid hormone synthesis, secretion and action" pathway. RUNX family members (RUNX1, RUNX2, RUNX3) are implicated in various cancers and are present in multiple pathways. FOXO1, a transcription factor involved in cellular senescence and apoptosis, is also identified. These transcription factors could be key regulators of the malignant phenotype in the CD4 T cells.
**3. Key Genes or Potential Targets:**
Several genes stand out as potential key players or therapeutic targets. *HRAS*, a proto-oncogene frequently mutated in cancers, appears in numerous cancer-related pathways, making it a prime candidate for further investigation. *AKT3*, a serine/threonine kinase involved in cell survival and proliferation, is also highly enriched across multiple pathways. Genes involved in the cell cycle (*CDKN1A*, *CCND1*, *CDK6*) are consistently enriched, highlighting the dysregulation of cell cycle control in malignant CD4 T cells. *CDKN2A*, a tumor suppressor gene, is frequently downregulated in cancer and its presence in multiple pathways suggests a potential mechanism of malignant transformation. Genes involved in oxidative phosphorylation (e.g., *NDUFA8*, *SDHA*, *SDHB*) are significantly enriched, suggesting potential vulnerabilities in the energy metabolism of malignant cells. Finally, genes involved in immune response pathways (e.g., *PRF1*, *GZMB*, *FASLG*) could be important for understanding immune evasion mechanisms.
The GO analysis is provided below.
----------------------------------------
The ClusterProfiler GO enrichment result is saved as a text file in the current working directory.
The list of significantly enriched GO terms is also saved as a text file in the current working directory.
GO Enrichment Insights:
**1. Significant Concepts:**
The GO enrichment analysis reveals significant dysregulation of several key processes in malignant CD4 T cells compared to healthy controls. Malignant cells show strong enrichment for pathways related to protein folding and energy metabolism (ATP, ADP), suggesting alterations in cellular bioenergetics and proteostasis. These changes are likely crucial for the survival and proliferation of the malignant cells.
Furthermore, there's substantial enrichment in pathways related to DNA metabolism and cell cycle regulation (DNA replication, G2/M transition), indicating increased genomic instability and uncontrolled proliferation in malignant CD4 T cells. Immune-related processes (T cell activation, B cell activation) are also affected, reflecting the malignant transformation's impact on immune function.
**2. Regulators:**
Several transcription factors appear involved in the observed changes. FOXP3, a key regulator of Treg cell differentiation, shows altered expression, potentially contributing to immune dysregulation. GATA3, crucial for Th2 cell differentiation, and TBX21 (T-bet), essential for Th1 differentiation, may be dysregulated, impacting the balance of T helper cell subsets. Other factors like RUNX1 and RUNX3, involved in hematopoiesis and T cell development, might also play a role in the malignant phenotype. These alterations in transcription factor expression likely drive the observed changes in gene expression and cellular processes.
**3. Key Genes or Potential Targets:**
Genes like *RRM1* and *RRM2* (ribonucleotide reductase subunits), crucial for DNA synthesis, are highly enriched and could be therapeutic targets due to their role in malignant cell proliferation. Similarly, genes involved in the electron transport chain (*NDUFA8*, *SDHB*, *SDHA*) and ATP synthesis are significantly upregulated, suggesting that targeting mitochondrial function might be effective. Genes related to T cell activation (*CD28*, *CD81*) and apoptosis (*FAS*, *BAK1*) are also enriched, highlighting potential targets for modulating immune responses and eliminating malignant cells. Furthermore, genes involved in cell cycle regulation (*CCNA2*, *CDK1*) and DNA repair (*BRCA2*) are potential targets for therapeutic intervention. Finally, genes involved in actin cytoskeleton organization (*ACTN1*, *MYO1B*) are also enriched, suggesting a potential role for cytoskeletal dynamics in malignant CD4 T cell behavior.
Overall Summary:
This integrated KEGG and GO analysis of malignant versus healthy CD4 T cells reveals a profound dysregulation across multiple cellular processes, strongly indicative of an oncogenic transformation.
**Pathway and Process Enrichment:** KEGG analysis highlights significant enrichment in pathways associated with various cancers (bladder, lung, pancreatic, glioma, basal cell carcinoma), suggesting a pan-cancerous signature in the malignant CD4 T cells. Metabolic pathways, including oxidative phosphorylation, fatty acid metabolism, and the TCA cycle, are also significantly altered, indicating a shift in energy production. Immune response pathways (T cell receptor signaling, Th1/Th2 differentiation, NK cell cytotoxicity, apoptosis) are dysregulated, reflecting the impact of malignant transformation on immune function. GO analysis corroborates these findings, showing significant enrichment in processes related to protein folding, energy metabolism (ATP, ADP), DNA metabolism, cell cycle regulation (DNA replication, G2/M transition), and immune responses (T cell and B cell activation).
**Key Regulators:** Several transcription factors are implicated in driving these changes. GATA3, RUNX family members (RUNX1, RUNX2, RUNX3), and FOXO1 are identified in KEGG analysis, while FOXP3, GATA3, TBX21 (T-bet), and RUNX1/3 are highlighted in GO analysis. These factors are known to play crucial roles in T cell differentiation, cell cycle regulation, apoptosis, and immune responses, suggesting their involvement in the malignant phenotype.
**Key Genes and Potential Targets:** Several genes emerge as potential key players and therapeutic targets. *HRAS* and *AKT3* are consistently enriched across multiple pathways, reflecting their known roles in oncogenesis. Cell cycle regulators (*CDKN1A*, *CCND1*, *CDK6*, *CDKN2A*) show significant dysregulation, highlighting the loss of cell cycle control. Genes involved in oxidative phosphorylation (*NDUFA8*, *SDHA*, *SDHB*) and ribonucleotide reductase (*RRM1*, *RRM2*) are enriched, suggesting potential metabolic vulnerabilities. Immune response genes (*PRF1*, *GZMB*, *FASLG*, *CD28*, *CD81*, *FAS*, *BAK1*) are also significantly altered, potentially contributing to immune evasion. Finally, genes involved in DNA repair (*BRCA2*) and actin cytoskeleton organization (*ACTN1*, *MYO1B*) are also implicated.
View(markersingroup)
Pathway Enrichment Analysis
# Network style summary of the enrichment analysis
scassist_pathway_summary
Proportion test
# Proportion test
# assign sample group
allsamplesgood$sample=ifelse(grepl("CD4Tcells_lab",allsamplesgood$cell_line), "CD4Tcells_lab", ifelse(grepl("L2",allsamplesgood$cell_line), "L2", ""))
prop.table(table(Idents(allsamplesgood), allsamplesgood$seurat_clusters))
0 1 2 3 4 5 6 7 8
L1 1.419734e-04 1.845655e-03 2.048474e-03 0.000000e+00 2.028192e-05 5.098874e-02 8.924044e-04 1.050603e-02 0.000000e+00
L2 4.056384e-05 1.037015e-01 2.636649e-04 0.000000e+00 2.028192e-05 2.007910e-03 1.419734e-04 3.833283e-03 0.000000e+00
L3 4.056384e-05 1.622553e-04 5.697191e-02 0.000000e+00 6.084576e-05 2.048474e-03 6.143393e-02 1.054660e-03 1.622553e-03
L4 2.028192e-05 1.014096e-04 3.139641e-02 2.028192e-05 6.084576e-05 5.476118e-04 7.869384e-03 5.881756e-04 6.573370e-02
L5 2.873948e-02 3.245107e-04 7.909948e-04 0.000000e+00 7.171686e-02 6.693033e-04 4.867660e-04 6.530778e-03 2.028192e-04
L6 5.559274e-02 1.419734e-04 1.379170e-03 0.000000e+00 5.476118e-04 2.880032e-03 4.867660e-04 2.555522e-02 4.056384e-05
L7 5.309806e-02 3.245107e-04 1.723963e-03 0.000000e+00 1.044519e-02 1.419734e-03 4.056384e-04 2.105263e-02 6.084576e-05
CD4Tcells_lab 0.000000e+00 3.853565e-04 0.000000e+00 4.790589e-02 0.000000e+00 9.998986e-03 0.000000e+00 0.000000e+00 4.056384e-05
CD4Tcells_10x 2.028192e-05 0.000000e+00 0.000000e+00 4.660785e-02 0.000000e+00 3.143697e-03 0.000000e+00 2.028192e-05 0.000000e+00
9 10 11 12 13
L1 4.161850e-02 1.622553e-04 8.234459e-03 1.663117e-03 2.028192e-05
L2 9.877294e-03 1.825373e-04 1.622553e-04 8.112767e-05 6.084576e-05
L3 3.731873e-03 0.000000e+00 3.853565e-04 3.853565e-04 2.474394e-03
L4 2.210729e-03 0.000000e+00 7.423182e-03 4.076666e-03 1.764527e-03
L5 2.920596e-03 0.000000e+00 2.981442e-03 8.112767e-04 5.962884e-03
L6 2.149883e-03 0.000000e+00 3.103134e-03 1.001927e-02 2.514958e-03
L7 3.163979e-03 2.028192e-05 1.133759e-02 3.954974e-03 1.115506e-03
CD4Tcells_lab 6.084576e-04 4.411317e-02 2.839469e-04 1.216915e-04 1.014096e-04
CD4Tcells_10x 1.014096e-04 2.066728e-02 6.084576e-05 4.462022e-04 0.000000e+00
Pathway Enrichment Analysis
# Network style summary of the enrichment analysis
table(Idents(allsamplesgood), allsamplesgood$seurat_clusters)
0 1 2 3 4 5 6 7 8 9 10 11 12 13
L1 7 91 101 0 1 2514 44 518 0 2052 8 406 82 1
L2 2 5113 13 0 1 99 7 189 0 487 9 8 4 3
L3 2 8 2809 0 3 101 3029 52 80 184 0 19 19 122
L4 1 5 1548 1 3 27 388 29 3241 109 0 366 201 87
L5 1417 16 39 0 3536 33 24 322 10 144 0 147 40 294
L6 2741 7 68 0 27 142 24 1260 2 106 0 153 494 124
L7 2618 16 85 0 515 70 20 1038 3 156 1 559 195 55
CD4Tcells_lab 0 19 0 2362 0 493 0 0 2 30 2175 14 6 5
CD4Tcells_10x 1 0 0 2298 0 155 0 1 0 5 1019 3 22 0
LS0tCnRpdGxlOiAic2NBU1NJU1QgQUkgTUVUSE9EIgphdXRob3I6ICJOYXNpciBNYWhtb29kIEFiYmFzaSIKZGF0ZTogImByIFN5cy5EYXRlKClgIgpvdXRwdXQ6CiAgaHRtbF9ub3RlYm9vazoKICAgIHRvYzogeWVzCiAgICB0b2NfZmxvYXQ6IHllcwogICAgdG9jX2NvbGxhcHNlZDogeWVzCiAgd29yZF9kb2N1bWVudDoKICAgIHRvYzogeWVzCiAgaHRtbF9kb2N1bWVudDoKICAgIHRvYzogeWVzCiAgICBkZl9wcmludDogcGFnZWQKICBwZGZfZG9jdW1lbnQ6CiAgICB0b2M6IHllcwotLS0KCgojIDEuIGxvYWQgbGlicmFyaWVzCmBgYHtyIHNldHVwLCBpbmNsdWRlPUZBTFNFfQojIExvYWQgYmVsb3cgbGlicmFyaWVzCmxpYnJhcnkoU2V1cmF0KQpsaWJyYXJ5KGdncGxvdDIpCmxpYnJhcnkocGxvdGx5KQpsaWJyYXJ5KHRpZHl2ZXJzZSkKbGlicmFyeShjb3dwbG90KQpsaWJyYXJ5KHNjUHJvcG9ydGlvblRlc3QpCmxpYnJhcnkocm9sbGFtYSkKbGlicmFyeShTQ2Fzc2lzdCkKbGlicmFyeShodHRyKQpsaWJyYXJ5KGpzb25saXRlKQpsaWJyYXJ5KGRwbHlyKQpgYGAKCgojIDIuIExvYWQgU2V1cmF0IE9iamVjdCAKYGBge3J9CgpBbGxfc2FtcGxlc19NZXJnZWQgPC0gcmVhZFJEUygiLi4vLi4vMC1TZXVyYXRfUkRTX09CSkVDVF9GSU5BTC9BbGxfc2FtcGxlc19NZXJnZWRfSGFybW9ueV9pbnRlZ3JhdGVkX0NlbGxfbGluZV9yZW5hbWVkXzAzLTA3LTIwMjUucmRzIikKCm91dHB1dHBhdGg9Ii4uL3NjQXNzaXN0LyIKCmBgYAoKIyMgUUMKYGBge3IsIGZpZy5oZWlnaHQ9OCwgZmlnLndpZHRoPTEwfQoKYXBpX2tleV9maWxlICA8LSAiYXBpX2tleXMudHh0IgoKCgojIEFuYWx5emUgcXVhbGl0eSBvZiB0aGUgY2VsbHMgdXNpbmcgU0Nhc3Npc3QKYWxsc2FtcGxlc3F1YWxpdHkgPC0gU0Nhc3Npc3RfYW5hbHl6ZV9xdWFsaXR5KCJBbGxfc2FtcGxlc19NZXJnZWQiLCBwZXJjZW50X210ID0gInBlcmNlbnQubXQiLCBwZXJjZW50X3JpYm8gPSAicGVyY2VudC5yYiIsIGFwaV9rZXlfZmlsZSA9IGFwaV9rZXlfZmlsZSwgbGxtX3NlcnZlcj0iZ29vZ2xlIikKCmBgYAoKIyMgTm9ybWFsaXphcmlvbgpgYGB7ciwgZmlnLmhlaWdodD04LCBmaWcud2lkdGg9MTB9CgojIEFzayBTQ2Fzc2lzdCB0byByZWNvbW1lbmQgbm9ybWFsaXphdGlvbiBtZXRob2QKbm9ybWFsaXphdGlvbl9yZWNvbW1lbmRhdGlvbjwtU0Nhc3Npc3RfcmVjb21tZW5kX25vcm1hbGl6YXRpb24oIkFsbF9zYW1wbGVzX01lcmdlZCIsIGxsbV9zZXJ2ZXI9Imdvb2dsZSIsIGFwaV9rZXlfZmlsZSA9IGFwaV9rZXlfZmlsZSkKCmBgYAoKCiMjIFZhcmlhYmxlIGZlYXR1cmVzCmBgYHtyLCBmaWcuaGVpZ2h0PTgsIGZpZy53aWR0aD0xMH0KCiMgTGlzdCB0b3AgMTAgdmFyaWFibGUgZ2VuZXMKdG9wMTAgPC0gaGVhZChWYXJpYWJsZUZlYXR1cmVzKEFsbF9zYW1wbGVzX01lcmdlZCksIDEwKQp0b3AxMAoKIyBQbG90IHZhcmlhYmxlIGZlYXR1cmVzLCBsYWJlbCB0aGUgdG9wIDEwLCBzYXZlIHRoZSBwbG90CnZmcDEgPC0gVmFyaWFibGVGZWF0dXJlUGxvdChBbGxfc2FtcGxlc19NZXJnZWQpCnZmcDEgPC0gTGFiZWxQb2ludHMocGxvdCA9IHZmcDEsIHBvaW50cyA9IHRvcDEwLCByZXBlbCA9IFRSVUUpCnZmcDEKCgpnZ3NhdmUocGFzdGUob3V0cHV0cGF0aCwidmFyaWFibGVmZWF0dXJlcGxvdC5wZGYiLHNlcD0iIiksIHBsb3QgPSB2ZnAxLCB3aWR0aCA9IDIwLCBoZWlnaHQgPSAxNSwgdW5pdHMgPSAiY20iKQpgYGAKCgoKCgojIyAgQW5hbHl6ZSBWYXJpYWJsZSBGZWF0dXJlcwpgYGB7ciwgZmlnLmhlaWdodD04LCBmaWcud2lkdGg9MTB9CgojIEV4cGVyaW1lbnRhbCBkZXNpZ24gc3RhdGVtZW50CmV4cGVyaW1lbnRhbF9kZXNpZ24gPSAiV2UgaGF2ZSA3IGNlbGwgbGluZXMgbWFsaWduYW50IENENCBUIGNlbGxzIGFuZCAyIGhlYWx0aHkgcGF0aWVudHMgQ0Q0IFQgY2VsbHMgbm9ybWFsIGZyb20gdGhlbSIKCiMgQXNrIFNDYXNzaXN0IHRvIGFuYWx5emUgdmFyaWFibGUgZmVhdHVyZXMKdmFyaWFibGVfZmVhdHVyZV9hbmFseXNpcyA8LSBTQ2Fzc2lzdF9hbmFseXplX3ZhcmlhYmxlX2ZlYXR1cmVzKCJBbGxfc2FtcGxlc19NZXJnZWQiLCBleHBlcmltZW50YWxfZGVzaWduID0gZXhwZXJpbWVudGFsX2Rlc2lnbiwgYXBpX2tleV9maWxlID0gYXBpX2tleV9maWxlLCBsbG1fc2VydmVyPSJnb29nbGUiKQoKCmBgYAoKCgojIyBQQ0EKYGBge3IsIGZpZy5oZWlnaHQ9OCwgZmlnLndpZHRoPTEwfQoKCgojIFByaW50IGdlbmVzIGluIHRoZSB0b3AgNSBwY3MKcHJpbnQoYWxsc2FtcGxlc2dvb2RbWyJwY2EiXV0sIGRpbXMgPSAxOjUsIG5mZWF0dXJlcyA9IDUpCgphbGxzYW1wbGVzZ29vZCA8LSBBbGxfc2FtcGxlc19NZXJnZWQKCmBgYAoKCgojIyBDaG9vc2luZyBhcHByb3ByaWF0ZSBudW1iZXIgb2YgUENzCmBgYHtyLCBmaWcuaGVpZ2h0PTgsIGZpZy53aWR0aD0xMH0KCnBjX3JlY29tbWVuZGF0aW9uPVNDYXNzaXN0X3JlY29tbWVuZF9wY3MoImFsbHNhbXBsZXNnb29kIiwgZXhwZXJpbWVudGFsX2Rlc2lnbiA9IGV4cGVyaW1lbnRhbF9kZXNpZ24sIGxsbV9zZXJ2ZXI9Imdvb2dsZSIsIGFwaV9rZXlfZmlsZSA9IGFwaV9rZXlfZmlsZSkKYGBgCgoKIyMgQ2hvb3NpbmcgYXBwcm9wcmlhdGUgbnVtYmVyIG9mIFBDcwpgYGB7ciwgZmlnLmhlaWdodD04LCBmaWcud2lkdGg9MTB9CgojIElkZW50aWZ5IG51bWJlciBvZiBQQ3MgdGhhdCBleHBsYWlucyBtYWpvcml0eSBvZiB2YXJpYXRpb25zCkVsYm93UGxvdChhbGxzYW1wbGVzZ29vZCwgbmRpbXM9NTAsIHJlZHVjdGlvbiA9ICJwY2EiKQpgYGAKCiMjIENob29zaW5nIGFwcHJvcHJpYXRlIG51bWJlciBvZiBQQ3MKYGBge3IsIGZpZy5oZWlnaHQ9OCwgZmlnLndpZHRoPTEwfQoKIyBWaXN1YWxpemUgdGhlIGdlbmVzIGluIHRoZSBmaXJzdCBQQwpWaXpEaW1Mb2FkaW5ncyhhbGxzYW1wbGVzZ29vZCwgZGltcyA9IDEsIG5jb2wgPSAxKSArIHRoZW1lX21pbmltYWwoYmFzZV9zaXplID0gOCkKYGBgCgoKIyMgQ2hvb3NpbmcgYXBwcm9wcmlhdGUgbnVtYmVyIG9mIFBDcwpgYGB7ciwgZmlnLmhlaWdodD04LCBmaWcud2lkdGg9MTB9CgojIFZpc3VhbGl6ZSB0aGUgZ2VuZXMgaW4gdGhlIHNlY29uZCBQQwpWaXpEaW1Mb2FkaW5ncyhhbGxzYW1wbGVzZ29vZCwgZGltcyA9IDIsIG5jb2wgPSAxKSArIHRoZW1lX21pbmltYWwoYmFzZV9zaXplID0gOCkKYGBgCgoKIyMgQ2hvb3NpbmcgYXBwcm9wcmlhdGUgbnVtYmVyIG9mIFBDcwpgYGB7ciwgZmlnLmhlaWdodD0xOCwgZmlnLndpZHRoPTIyfQoKIyBQbG90IGhlYXRtYXAgd2l0aCBjZWxscz01MDAgcGxvdHRpbmcgY2VsbHMgd2l0aCBleHRyZW1lIGNlbGxzIG9uIGJvdGggZW5kcyBvZiBzcGVjdHJ1bQpEaW1IZWF0bWFwKG9iamVjdCA9IGFsbHNhbXBsZXNnb29kLCBkaW1zID0gMToxNSwgY2VsbHMgPSA1MDAsIGJhbGFuY2VkID0gVFJVRSkKYGBgCgoKCgojIyBTQ2Fzc2lzdCBhbmFseXplIFBDcwpgYGB7ciwgZmlnLmhlaWdodD04LCBmaWcud2lkdGg9MTB9CgpwY19hbmFseXplZD1TQ2Fzc2lzdF9hbmFseXplX3BjcygiYWxsc2FtcGxlc2dvb2QiLCBudW1fcGNzID0gNSwgZXhwZXJpbWVudGFsX2Rlc2lnbiA9IGV4cGVyaW1lbnRhbF9kZXNpZ24sIGxsbV9zZXJ2ZXI9Imdvb2dsZSIsIGFwaV9rZXlfZmlsZSA9IGFwaV9rZXlfZmlsZSkKYGBgCgojIyBGaW5kTmVpZ2hib3JzCmBgYHtyLCBmaWcuaGVpZ2h0PTgsIGZpZy53aWR0aD0xMH0KCiMgUnVuIFNDYXNzaXN0X3JlY29tbWVuZF9rCnJlY29tbWVuZGVkX2sgPC0gU0Nhc3Npc3RfcmVjb21tZW5kX2soImFsbHNhbXBsZXNnb29kIiwgbnVtX3BjcyA9IDEwLCBsbG1fc2VydmVyPSJnb29nbGUiLCBhcGlfa2V5X2ZpbGUgPSBhcGlfa2V5X2ZpbGUpCmBgYAoKIyMgQ2x1c3RlcmluZwpgYGB7ciwgZmlnLmhlaWdodD04LCBmaWcud2lkdGg9MTB9CgphbGxzYW1wbGVzZ29vZCA8LSBGaW5kTmVpZ2hib3JzKGFsbHNhbXBsZXNnb29kLCBkaW1zID0gMToxNSwgay5wYXJhbSA9IDMwLCByZXR1cm4ubmVpZ2hib3IgPSBUUlVFKQojIFJ1biBTQ2Fzc2lzdF9yZWNvbW1lbmRfcmVzCnJlY29tbWVuZGVkX3JlcyA8LSBTQ2Fzc2lzdF9yZWNvbW1lbmRfcmVzKCJhbGxzYW1wbGVzZ29vZCIsIGxsbV9zZXJ2ZXI9Imdvb2dsZSIsIGFwaV9rZXlfZmlsZSA9IGFwaV9rZXlfZmlsZSkKCmBgYAoKIyMgRmluZCBhbGwgbWFya2VycwpgYGB7ciwgZmlnLmhlaWdodD04LCBmaWcud2lkdGg9MTB9CklkZW50cyhhbGxzYW1wbGVzZ29vZCkgPC0gInNldXJhdF9jbHVzdGVycyIKbWFya2Vyc2FsbD1GaW5kQWxsTWFya2VycyhhbGxzYW1wbGVzZ29vZCkKCgojIFNhdmUgYWxsIG1hcmtlcnMgYXMgdGFiIGRlbGltaXRlZCBmaWxlCndyaXRlLnRhYmxlKG1hcmtlcnNhbGwsIGZpbGU9cGFzdGUob3V0cHV0cGF0aCwibWFya2Vyc2FsbC50eHQiLHNlcD0iIikscXVvdGU9Riwgcm93Lm5hbWVzID0gRkFMU0UsIHNlcCA9ICIsIikKYGBgCgoKIyMgQW5ub3RhdGlvbiB3aXRoIFNDYXNzaXN0CmBgYHtyLCBmaWcuaGVpZ2h0PTgsIGZpZy53aWR0aD0xMH0KCklkZW50cyhhbGxzYW1wbGVzZ29vZCkgPC0gInNldXJhdF9jbHVzdGVycyIKCiMgUnVuIFNDYXNzaXN0X2FuYWx5emVfYW5kX2Fubm90YXRlCnNjYV9hbm5vdGF0aW9uIDwtIFNDYXNzaXN0X2FuYWx5emVfYW5kX2Fubm90YXRlKG1hcmtlcnNhbGwsIHRvcF9nZW5lcyA9IDUwLCBzZXVyYXRfb2JqZWN0X25hbWUgPSAiYWxsc2FtcGxlc2dvb2QiLCBhcGlfa2V5X2ZpbGUgPSBhcGlfa2V5X2ZpbGUsIGxsbV9zZXJ2ZXI9Imdvb2dsZSIpCmBgYAoKIyMgUGF0aHdheSBFbnJpY2htZW50IEFuYWx5c2lzCmBgYHtyLCBmaWcuaGVpZ2h0PTgsIGZpZy53aWR0aD0xMH0KSWRlbnRzKGFsbHNhbXBsZXNnb29kKSA8LSAiY2VsbF9saW5lIgptYXJrZXJzaW5ncm91cCA8LSBGaW5kTWFya2VycyhhbGxzYW1wbGVzZ29vZCwgaWRlbnQuMSA9ICJMMSIsIGlkZW50LjIgPSAiTDIiLCBncm91cC5ieSA9ICdjZWxsX2xpbmUnKQoKCnNjYXNzaXN0X3BhdGh3YXlfc3VtbWFyeSA8LSBTQ2Fzc2lzdF9hbmFseXplX2VucmljaG1lbnQobWFya2Vycz1tYXJrZXJzaW5ncm91cCxleHBlcmltZW50YWxfZGVzaWduID0gZXhwZXJpbWVudGFsX2Rlc2lnbiwgcHZhbHVlPTAuMDUsIGxvZzJGQz0xLCBhcGlfa2V5X2ZpbGU9YXBpX2tleV9maWxlLCBsbG1fc2VydmVyID0gImdvb2dsZSIpCmBgYAoKIyMgUGF0aHdheSBFbnJpY2htZW50IEFuYWx5c2lzCmBgYHtyLCBmaWcuaGVpZ2h0PTgsIGZpZy53aWR0aD0xMH0KCiMgTmV0d29yayBzdHlsZSBzdW1tYXJ5IG9mIHRoZSBlbnJpY2htZW50IGFuYWx5c2lzCnNjYXNzaXN0X3BhdGh3YXlfc3VtbWFyeQpgYGAKCiMjIFByb3BvcnRpb24gdGVzdApgYGB7ciwgZmlnLmhlaWdodD04LCBmaWcud2lkdGg9MTB9CgojIFByb3BvcnRpb24gdGVzdAojIGFzc2lnbiBzYW1wbGUgZ3JvdXAKYWxsc2FtcGxlc2dvb2Qkc2FtcGxlPWlmZWxzZShncmVwbCgiQ0Q0VGNlbGxzX2xhYiIsYWxsc2FtcGxlc2dvb2QkY2VsbF9saW5lKSwgIkNENFRjZWxsc19sYWIiLCBpZmVsc2UoZ3JlcGwoIkwyIixhbGxzYW1wbGVzZ29vZCRjZWxsX2xpbmUpLCAiTDIiLCAiIikpCgpwcm9wLnRhYmxlKHRhYmxlKElkZW50cyhhbGxzYW1wbGVzZ29vZCksIGFsbHNhbXBsZXNnb29kJHNldXJhdF9jbHVzdGVycykpCmBgYAoKIyMgUGF0aHdheSBFbnJpY2htZW50IEFuYWx5c2lzCmBgYHtyLCBmaWcuaGVpZ2h0PTgsIGZpZy53aWR0aD0xMH0KCiMgTmV0d29yayBzdHlsZSBzdW1tYXJ5IG9mIHRoZSBlbnJpY2htZW50IGFuYWx5c2lzCnRhYmxlKElkZW50cyhhbGxzYW1wbGVzZ29vZCksIGFsbHNhbXBsZXNnb29kJHNldXJhdF9jbHVzdGVycykKYGBgCgoKCg==