HNSCC RNA-seq Datasets with Primary and Normal Samples

Large-Scale Public Databases

Dataset Database Primary Tumor Samples Normal Samples Total Samples Technology Notes
TCGA-HNSC TCGA 500-545 44 ~545 RNA-seq (Illumina HiSeq) Most comprehensive HNSCC dataset; includes oral cavity (62%), oropharynx (12%), larynx (26%)
TCGA-HNSC (subset) TCGA 499 44 543 RNA-seq Used in multiple bioinformatics studies
TCGA-HNSC (subset) TCGA 111 12 123 RNA-seq Level 3 mRNA expression with clinical information

Gene Expression Omnibus (GEO) Datasets

Bulk RNA-seq Datasets

Dataset Database Primary Tumor Samples Normal Samples Total Samples Technology/Platform Notes
GSE83519 GEO 30 14 44 RNA-seq Used for differential expression analysis
GSE6631 GEO 22 22 44 Affymetrix Human Genome U95 Version 2 Array (microarray) Paired tumor and normal samples from oral cavity, larynx, oropharynx, hypopharynx, sinonasal cavity
GSE41613 GEO 97 - 97 Expression profiling Used in validation studies
GSE42743 GEO 74 - 74 Expression profiling Used in validation studies
GSE65858 GEO 270 - 270 Expression profiling Large validation cohort
GSE25083 GEO 91 18 109 DNA Methylation (Illumina Infinium 27k) Oralpharynx and larynx tumors + NDRI normal samples
GSE27020 GEO - - - Expression profiling Used for disease-free survival analysis
GSE16076 GEO - - - Expression profiling Used for external validation
GSE23036 GEO - - - Expression profiling Used for differential expression analysis

Single-Cell RNA-seq Datasets

Dataset Database Tumor Samples Normal Samples Cell Count Technology Notes
GSE103322 GEO 18 patients - ~6,000 cells 10x Genomics single-cell RNA-seq Primary oral cavity tumors and matched lymph node metastases (5 pairs)
GSE181919 GEO 20 primary HNSCC 9 normal tissues - 10x Genomics single-cell RNA-seq Also includes 4 precancerous leukoplakia and 4 metastasized tumors; total 37 tissue specimens from 23 patients
GSE164690 GEO - - - Single-cell RNA-seq Used for validation of malignant cell clusters
Not specified Published study 15 NPC tumors 1 normal nasopharyngeal tissue 40,285 immune cells + 7,581 malignant cells 10x Genomics Chromium Nasopharyngeal carcinoma (NPC) specific
Not specified Published study 14 patients (primary + LN metastases) - - Single-cell RNA-seq + TCR-seq HPV-negative HNSCC from primary and cervical lymph nodes; 7 pairs for scRNA-seq
Not specified Published study 10 patients (primary + 5 LN metastases) - 2,176 malignant cells Single-cell RNA-seq 17,113 genes profiled; Puram et al. 2017 study

Integrated Datasets

Dataset Source Primary Tumor Samples Normal Samples Technology Notes
GTEx + TCGA Combined >8,000 TCGA tumors >9,000 GTEx (53 tissues from 544 individuals) RNA-seq Unified pipeline with batch correction; includes multiple cancer types
GEPIA2 Combined 9,736 tumors 8,587 normal samples RNA-seq TCGA + GTEx combined; standard processing pipeline

Key Features by Dataset

TCGA-HNSC

  • Subsites: Oral cavity, oropharynx, larynx, hypopharynx
  • HPV Status: Both HPV-positive and HPV-negative samples
  • Clinical Data: Complete survival information, demographics, smoking history
  • Molecular Data: WES, RNA-seq, methylation, copy number variation

Single-Cell Datasets

  • Cellular Resolution: Detailed characterization of:
    • Malignant cells with heterogeneity profiles
    • Tumor microenvironment (TME) components
    • Immune cell populations
    • Cancer-associated fibroblasts (CAFs)
    • Epithelial-to-mesenchymal transition (EMT) programs

Data Access


Notes on Sample Quality

  1. TCGA samples are comprehensively characterized with multi-platform data
  2. Single-cell datasets provide cellular heterogeneity information but from fewer patients
  3. GEO datasets vary in quality and processing methods
  4. Paired samples (tumor + normal from same patient) are available in GSE6631 and some TCGA samples
  5. Most studies focus on HPV-negative HNSCC from oral cavity, though HPV-positive cases are included in TCGA