Molecular Subtyping Analysis with PAM50 Classifier

In this analysis, we will use the molecular.subtyping function from the genefu package to apply the PAM50 classifier to RNA-seq data.

# Load the PAM50 classifier and robust parameters
library(genefu)
library(dplyr)
library(pheatmap)
data(pam50)
data(pam50.robust)

# Read the expression data
exp <- read.table("tpm_counts.txt", sep='\t', header = TRUE)
dim(exp)
## [1] 19790    41
# view few lines of data
head(exp, 2)

Data Preprocessing

# Create a gene_info data frame
gene_info <- select(exp, c("Genes"))
head(gene_info)
# Use the first column for row names
exp <- data.frame(exp, row.names = 1)
# view few lines of data
head(exp, 2)
# Transpose the expression matrix
texp <- t(exp)
# view few lines of data
head(as.data.frame(texp), 2)
# Rename the columns with official gene symbols
colnames(texp) <- gene_info$Genes

Molecular Subtyping

# Apply the PAM50 classifier using molecular.subtyping function
pam50_predictions <- molecular.subtyping(
  sbt.model = "pam50",
  data = texp,
  annot = gene_info,
  do.mapping = FALSE)
# Display the PAM50 subtypes
as.data.frame(pam50_predictions$subtype)

Subtype Probabilities

# Display the subtype probabilities
as.data.frame(pam50_predictions$subtype.proba)

Subtype Predictions

# Display the subtypes predictions
as.data.frame(pam50_predictions$subtype.crisp)
# Display the crisp subtypes
m=as.data.frame(pam50_predictions$subtype.proba)
pheatmap(m, scale = "row", colorRampPalette(c("navy", "white", "#FF1493"))(75))

In this analysis, we loaded the PAM50 classifier and robust parameters, read the RNA-seq expression data, and performed necessary data preprocessing steps. We then applied the PAM50 classifier using the molecular.subtyping function and displayed the obtained molecular subtypes, subtype probabilities, and crisp subtypes.