| Dataset | Database | Primary Tumor Samples | Normal Samples | Total Samples | Technology | Notes |
|---|---|---|---|---|---|---|
| TCGA-HNSC | TCGA | 500-545 | 44 | ~545 | RNA-seq (Illumina HiSeq) | Most comprehensive HNSCC dataset; includes oral cavity (62%), oropharynx (12%), larynx (26%) |
| TCGA-HNSC (subset) | TCGA | 499 | 44 | 543 | RNA-seq | Used in multiple bioinformatics studies |
| TCGA-HNSC (subset) | TCGA | 111 | 12 | 123 | RNA-seq Level 3 | mRNA expression with clinical information |
| Dataset | Database | Primary Tumor Samples | Normal Samples | Total Samples | Technology/Platform | Notes |
|---|---|---|---|---|---|---|
| GSE83519 | GEO | 30 | 14 | 44 | RNA-seq | Used for differential expression analysis |
| GSE6631 | GEO | 22 | 22 | 44 | Affymetrix Human Genome U95 Version 2 Array (microarray) | Paired tumor and normal samples from oral cavity, larynx, oropharynx, hypopharynx, sinonasal cavity |
| GSE41613 | GEO | 97 | - | 97 | Expression profiling | Used in validation studies |
| GSE42743 | GEO | 74 | - | 74 | Expression profiling | Used in validation studies |
| GSE65858 | GEO | 270 | - | 270 | Expression profiling | Large validation cohort |
| GSE25083 | GEO | 91 | 18 | 109 | DNA Methylation (Illumina Infinium 27k) | Oralpharynx and larynx tumors + NDRI normal samples |
| GSE27020 | GEO | - | - | - | Expression profiling | Used for disease-free survival analysis |
| GSE16076 | GEO | - | - | - | Expression profiling | Used for external validation |
| GSE23036 | GEO | - | - | - | Expression profiling | Used for differential expression analysis |
| Dataset | Database | Tumor Samples | Normal Samples | Cell Count | Technology | Notes |
|---|---|---|---|---|---|---|
| GSE103322 | GEO | 18 patients | - | ~6,000 cells | 10x Genomics single-cell RNA-seq | Primary oral cavity tumors and matched lymph node metastases (5 pairs) |
| GSE181919 | GEO | 20 primary HNSCC | 9 normal tissues | - | 10x Genomics single-cell RNA-seq | Also includes 4 precancerous leukoplakia and 4 metastasized tumors; total 37 tissue specimens from 23 patients |
| GSE164690 | GEO | - | - | - | Single-cell RNA-seq | Used for validation of malignant cell clusters |
| Not specified | Published study | 15 NPC tumors | 1 normal nasopharyngeal tissue | 40,285 immune cells + 7,581 malignant cells | 10x Genomics Chromium | Nasopharyngeal carcinoma (NPC) specific |
| Not specified | Published study | 14 patients (primary + LN metastases) | - | - | Single-cell RNA-seq + TCR-seq | HPV-negative HNSCC from primary and cervical lymph nodes; 7 pairs for scRNA-seq |
| Not specified | Published study | 10 patients (primary + 5 LN metastases) | - | 2,176 malignant cells | Single-cell RNA-seq | 17,113 genes profiled; Puram et al. 2017 study |
| Dataset | Source | Primary Tumor Samples | Normal Samples | Technology | Notes |
|---|---|---|---|---|---|
| GTEx + TCGA | Combined | >8,000 TCGA tumors | >9,000 GTEx (53 tissues from 544 individuals) | RNA-seq | Unified pipeline with batch correction; includes multiple cancer types |
| GEPIA2 | Combined | 9,736 tumors | 8,587 normal samples | RNA-seq | TCGA + GTEx combined; standard processing pipeline |
For Large-Scale Studies: TCGA-HNSC (500+ tumor, 44
normal)
For Paired Analysis: GSE6631 (22 paired samples)
For Single-Cell Analysis: GSE103322 or GSE181919
For Validation: GSE41613, GSE42743, GSE65858