NCI60 FDR Analysis for Melanoma, CNS, and Leukemia

Key Question: Are genes statistically independent?

No - genes from the same cell are not statistically independent. Gene expression is correlated due to:

  • Co-regulation (genes in same pathway)

  • Chromosomal proximity

  • Shared transcription factors

This means we use ind = FALSE in our FDR function, which uses the more conservative BH procedure that accounts for dependence.

library(ISLR)
data(NCI60)
source("fdr.pck")

table(NCI60$labs) 
## 
##      BREAST         CNS       COLON K562A-repro K562B-repro    LEUKEMIA 
##           7           5           7           1           1           6 
## MCF7A-repro MCF7D-repro    MELANOMA       NSCLC     OVARIAN    PROSTATE 
##           1           1           8           9           6           2 
##       RENAL     UNKNOWN 
##           9           1
# Function to find interesting genes for one cancer type
get_interesting <- function(cancer, Q = 0.2, ind = FALSE) {
  
  # Which samples belong to this cancer type
  grp <- NCI60$labs == cancer
  
  # Run a two-sample t-test for each gene (cancer vs all others)
  pvals <- apply(NCI60$data, 2, function(g) {
    t.test(g[grp], g[!grp])$p.value
  })
  
  # Apply FDR correction using our fdr() function
  # ind = FALSE because genes are NOT independent (correlated expression)
  out <- fdr(pvals, Q = Q, ind = ind)
  
   # Handle both logical and index output from fdr()
  if (is.logical(out$interesting)) {
    interesting <- which(out$interesting)  # convert TRUE/FALSE to indices
  } else {
    interesting <- out$interesting         # already indices or names
  }
  
  # Compute mean expression inside vs outside the cancer group
  mean_in  <- colMeans(NCI60$data[grp,  , drop = FALSE])
  mean_out <- colMeans(NCI60$data[!grp, , drop = FALSE])
  
  # Build results table
  tab <- data.frame(
    gene      = interesting,
    p_value   = pvals[interesting],
    mean_diff = mean_in[interesting] - mean_out[interesting]
  )
  
  # Sort by smallest p-value first
  tab <- tab[order(tab$p_value), ]
  rownames(tab) <- NULL
  
  return(tab)
}

# ── Run analysis for each cancer type ──────────────────────────────────────────
mel <- get_interesting("MELANOMA", Q = 0.2, ind = FALSE)

cns <- get_interesting("CNS",      Q = 0.2, ind = FALSE)

leu <- get_interesting("LEUKEMIA", Q = 0.2, ind = FALSE)

# ── How many interesting genes were found? ─────────────────────────────────────
cat("Melanoma interesting genes: ", nrow(mel), "\n")
## Melanoma interesting genes:  379
cat("CNS interesting genes:      ", nrow(cns), "\n")
## CNS interesting genes:       81
cat("Leukemia interesting genes: ", nrow(leu), "\n")
## Leukemia interesting genes:  347
# ── Top genes for each cancer ──────────────────────────────────────────────────
cat("\n--- Top Melanoma Genes ---\n")
## 
## --- Top Melanoma Genes ---
print(head(mel, 15))
##    gene      p_value  mean_diff
## 1  4347 8.413460e-16  2.0999121
## 2   247 2.369564e-10 -2.0151775
## 3   196 1.335088e-09 -3.3335695
## 4  5467 2.604742e-09  1.7998241
## 5  5586 3.164027e-09 -2.5489275
## 6  4384 5.467477e-09  2.0351789
## 7  4255 5.831809e-09  1.4582157
## 8  6389 7.157991e-09 -1.2303498
## 9  4348 1.261400e-08  2.2564289
## 10 4256 1.506040e-08  1.3016078
## 11 6399 1.657162e-08 -0.7661597
## 12 4355 2.138624e-08  2.2433043
## 13 2024 3.116491e-08 -1.0046410
## 14 2233 3.212347e-08  0.5880368
## 15 5320 3.212880e-08 -0.8382132
cat("\n--- Top CNS Genes ---\n")
## 
## --- Top CNS Genes ---
print(head(cns, 15))
##    gene      p_value  mean_diff
## 1  5481 2.090845e-14  3.0291066
## 2  6686 3.317234e-11  1.7095010
## 3  5867 8.407955e-10  2.4878229
## 4  5482 9.839034e-10  1.2699925
## 5  5689 1.332537e-09  1.4608247
## 6  6689 2.568636e-09  2.2138227
## 7  4527 4.304442e-09 -0.6175505
## 8  1487 7.139663e-09 -0.9207189
## 9  5839 7.431886e-09  2.8423654
## 10 3174 1.001831e-08 -1.1023804
## 11 1968 5.215960e-08 -0.8247542
## 12  439 8.594930e-08 -0.6760414
## 13 5074 1.012065e-07 -0.4940926
## 14 6688 1.079841e-07  2.3098230
## 15 1715 1.804007e-07 -0.6045498
cat("\n--- Top Leukemia Genes ---\n")
## 
## --- Top Leukemia Genes ---
print(head(leu, 15))
##    gene      p_value  mean_diff
## 1  3933 7.470566e-13 -1.2753114
## 2  2320 7.557846e-13  1.6215851
## 3  2170 2.339842e-12  0.8099471
## 4  5872 5.713374e-12 -2.8830413
## 5  4244 7.284406e-11 -3.4433864
## 6  5868 2.186107e-10 -3.6918922
## 7  1343 5.734032e-10  0.6610965
## 8  1693 1.754387e-09  0.9770161
## 9  5949 3.754526e-09 -0.9428115
## 10 2215 7.124688e-09  0.3462690
## 11 1283 9.793830e-09  0.4944874
## 12  304 1.105311e-08 -1.0253684
## 13 2347 1.727125e-08  0.7853494
## 14 5878 1.829951e-08 -3.3778114
## 15 1692 1.949873e-08  0.8774184
# ── Find overlapping genes between cancer pairs ────────────────────────────────
common_mel_cns <- intersect(mel$gene, cns$gene)
common_mel_leu <- intersect(mel$gene, leu$gene)
common_cns_leu <- intersect(cns$gene, leu$gene)

cat("\n--- Common genes: Melanoma & CNS ---\n")
## 
## --- Common genes: Melanoma & CNS ---
print(common_mel_cns)
##  [1] 4348  257 5839 5868 5869 1013 5694 5804 4574 6388 5746
cat("Count:", length(common_mel_cns), "\n")
## Count: 11
cat("\n--- Common genes: Melanoma & Leukemia ---\n")
## 
## --- Common genes: Melanoma & Leukemia ---
print(common_mel_leu)
##  [1] 6399 6416 4288  251 5868 5869 4289  252  245 6391 5887  315 6369 4448 5872
## [16] 2025 4287 6414 6393 6392 1919 5804 5555 5356 4317 4380 5886 5556 4067 3874
## [31] 6081 1347  143 5557 2039
cat("Count:", length(common_mel_leu), "\n")
## Count: 35
cat("\n--- Common genes: CNS & Leukemia ---\n")
## 
## --- Common genes: CNS & Leukemia ---
print(common_cns_leu)
## [1] 5867 5869 5868 6065 5930 5804 5880 6018 5917
cat("Count:", length(common_cns_leu), "\n")
## Count: 9

Graph Interpretations:

The plots show the ordered p-values for each gene when comparing a specific cancer type (Melanoma, CNS, and Leukemia) to all other cancers. The black curve represents all genes, while the red points highlight the genes that are considered statistically significant after applying the FDR correction with Q=0.2. The horizontal line represents the significance threshold.

In all three graphs, most p-values increase smoothly, which indicates that the majority of genes are not significantly different across cancer types. However, a small cluster of red points appears at the left side of each graph, showing that only a subset of genes have very low p-values and are considered “interesting.”

Comparing the three cancers, Melanoma and Leukemia have many more significant genes (379 and 347) compared to CNS (81). This suggests that Melanoma and Leukemia have stronger or more widespread gene expression differences relative to other cancers, while CNS has fewer genes that clearly stand out.

Overall, the graphs show that gene expression differences are concentrated in a small number of genes, and most genes are not strongly associated with a specific cancer type.

Are there any common interesting genes between any 2 cancers?

Yes.

Cancer Pair Common Genes Count
Melanoma ∩ CNS 4348, 257, 5839, 5868, 5869, 1013, 5694, 5804, 4574, 6388, 5746 11 genes
Melanoma ∩ Leukemia 6399, 6416, 4288, 251, 5868, 5869, 4289, 252, 245, 6391… 35 genes
CNS ∩ Leukemia 5867, 5869, 5868, 6065, 5930, 5804, 5880, 6018, 5917 9 genes

Observations

  • Melanoma & Leukemia share the most common genes (35) suggesting possible shared biological mechanisms

  • Genes 5868 and 5869 appear in ALL THREE cancer types — these may represent broadly dysregulated genes across multiple cancers

  • Gene 5804 appears in all three pairwise comparisons, making it a particularly noteworthy gene

  • The direction of expression (mean_diff) matters:

    • Positive mean_diff → gene is over-expressed in that cancer

    • Negative mean_diff → gene is under-expressed in that cancer

Interesting Genes Found Per Cancer

Cancer Sample Size Interesting Genes Found
Melanoma n = 8 379 genes
CNS n = 5 81 genes
Leukemia n = 6 347 genes
  • CNS found the fewest interesting genes, which makes sense given it has the smallest sample size (n=5), making it harder to detect significant differences

  • Melanoma and Leukemia found substantially more interesting genes (~350+)

Top Genes Per Cancer

Melanoma:

  • Top gene: 4347 (p = 8.4e-16, mean_diff = +2.10) → over-expressed

  • Gene 247 (p = 2.4e-10, mean_diff = -2.02) → under-expressed

  • Gene 196 (p = 1.3e-09, mean_diff = -3.33) → strongly under-expressed

CNS:

  • Top gene: 5481 (p = 2.1e-14, mean_diff = +3.03) → strongly over-expressed

  • Gene 6686 (p = 3.3e-11, mean_diff = +1.71) → over-expressed

  • Gene 4527 (p = 4.3e-09, mean_diff = -0.62) → under-expressed

Leukemia:

  • Top gene: 3933 (p = 7.5e-13, mean_diff = -1.28) → under-expressed

  • Gene 2320 (p = 7.6e-13, mean_diff = +1.62) → over-expressed

  • Gene 5868 (p = 2.2e-10, mean_diff = -3.69) → strongly under-expressed