LNCaP RNA-Seq Analysis

Marvin Lim, Barry Digby, Dr. Pilib O’Broin, Prof. Stephen Finn

Overview

This document provides results from a differential expression analysis performed on LNCaP cell lines (n=3) with varying degrees of enzalutamide resistance: Clone1 = highly resistant, Clone9 = moderately resistant and Control samples. Sequencing was performed by TrinSeq, at the Trinity Genome Sequencing Laboratory located in the Trinity Translational Medicine Institute, Trinity College Dublin using MiSeq technology.

Exploratory Analysis

Gene expression counts were normalised using DESeq2 median of ratios method and transformed using logarithm base 2 for visualisation.

library(knitr)
library(DESeq2)
library(RColorBrewer)
library(pheatmap)
library(gplots)
library(biomaRt)
library(PCAtools)
library(DT)
library(IHW)
library(apeglm)
library(EnhancedVolcano)
library(ComplexHeatmap)
library(circlize) #colorRamp2
library(dendextend) #clustering
library(tidyverse)
library(fgsea)
library(STRINGdb)
library(networkD3)
library(visNetwork)
library(igraph)

## Set dir
setwd("/home/barry/Desktop/GSE118959/scripts/")

mtx <- read.table("../data/RNA-Seq/mRNA_counts.txt", header=T, row.names = "gene_symbols")
samples <- c("Clone1_N1", "Clone1_N2", "Clone1_N3",
             "Clone9_N1", "Clone9_N2", "Clone9_N3",
             "Control_N1", "Control_N2", "Control_N3")
colnames(mtx) <- samples
## Create sample DF
sample_vec <- c()
replicates_vec <- c()
sample_df <- as.data.frame(matrix(ncol=2, nrow=0))
for(i in samples){
  split <- strsplit(i, "\\_")
  sample <- split[[1]][1]
  rep <- split[[1]][2]
  sample_df[i,1] <- sample
  sample_df[i,2] <- rep
}
colnames(sample_df) <- c("Condition", "Replicate")
sample_df$Condition <- as.factor(sample_df$Condition)
sample_df$Replicate <- as.factor(sample_df$Replicate)
## Create DDS object
dds <- DESeqDataSetFromMatrix(mtx, colData = sample_df, design = ~ Replicate + Condition)
dds$Condition <- relevel(dds$Condition, ref = "Control")
  keep <- rowSums(counts(dds)) >= 20
dds <- dds[keep,]
dds <- DESeq(dds) 
## Get log2 counts
cts <- counts(dds, normalized=T)
log_counts <- log2(cts + 1)

Sample to Sample Heatmaps

num_conditions <- nlevels(sample_df$Condition)
pal <- colorRampPalette(brewer.pal(num_conditions, "Set1"))(num_conditions)
cond_colors <- pal[as.integer(sample_df$Condition)]
heatmap.2(cor(log_counts), scale="column", dendrogram = "column", labRow="", 
          ColSideColors=cond_colors, trace='none', margins=c(7,8),
          main='Sample correlations (log2-transformed)')

Heatmap of sample to sample distances. The heatmap was built using the heatmap.2 package using log2 transformed gene expression data. Samples are clustered by replicate according to their degree of enzalutamide resistance. Heatmap shows there are no outliers in the dataset and Clone1 has the largest distance in latent space.

Scree Plot

p <- pca(log_counts, metadata = sample_df)
screeplot(p, title = "Scree plot of LNCaP Dataset")

Scree plot shows principal component 1 (PC1) explains ~70% of the variation present in the dataset.

PCA

biplot(p, x="PC1", y="PC2",
  colby = 'Condition', colkey = c('Control'='royalblue', 'Clone9'='forestgreen', 'Clone1'='red'),
  hline = 0, vline = 0,
  #legendPosition = 'right', legendLabSize = 12, legendIconSize = 8.0,
  drawConnectors = TRUE,
  title = 'PCA bi-plot',
  subtitle = 'PC1 versus PC2')

Principal component analysis was performed on the log2 transformed counts using the PCAtools package. PCA bi-plot demonstrates clustering of samples and confirms the quality of the experiment design: each replicate clusters according to their resistance to enzalutamide. Furthermore, it is revealed that the variation observed in PC1 in the scree plot is accounted for by Clone1, as it clusters further away from Clone9 and Control in PC1 (69.14%) Separation of Clone9 and Control samples is evident in PC2 (11.99% variation).

Due to the small variation present in Clone9 and Control replicates in the PCA bi-plot, the coefficient “replicates” will be accounted for when fitting the generalized linear model in DESeq2 to remove technical variation from the analysis.

Differential Expression Analysis

Differential gene tests were conducted using DESeq2 package. DESeq2 fits negative binomial generalized linear models for each gene and uses the Wald test for significance testing, returning a table of differentially genes. Multiple testing correction was performed using the independent hypothesis weighting IHW package, with the significance threshold for genes set at P-value < 0.05. Genes were annotated using the BiomaRt package.

Initial results returned a subset of genes with extremely high/low Log2 Fold Changes in the order of +/- 20. This is due to the high variability within these gene counts. When the read counts are low or highly variable, the maximum likelihood estimates for the Log Fold Change has high variance, leading to large estimates not representative of true differences. Lowly expressed genes are filtered out using the rowSums function, where the threshold is set to 20. Inevitably, highly variable genes will exceed this filtering threshold and introduce outliers to the results. The authors of DESeq2 propose the use of a heavy-tailed Cauchy prior distribution for effect sizes. The proposed method, Approximate Posterior Estimation for GLM, apeglm, has lower bias than previously proposed shrinkage estimators, while still reducing variance for those genes with little information for statistical inference. The rowSums function to filter lowly expressed genes was used in conjunction with the apeglm function to shrink the effect size of genes with high variance, returning a robust set of results for the analysis. Results in the tables have been filtered according to an adjusted P-value of <0.05 and a log2 Fold Change >= 1 / <= -1, returning both biologically and statistically significant genes.

mart <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")

Clone1 vs Control

Up Regulated

res1 <- results(dds, filterFun=ihw, alpha=0.05, c("Condition", "Clone1", "Control"))
LFC1 <- lfcShrink(dds = dds, res= res1, coef = 4, type = "apeglm")
LFC1_df <- as.data.frame(LFC1)

output_col <- c("Gene", "Chromosome", "Start", "Stop", "Strand", "Description", "Log2FC", "P-value", "Adj P-value")

  up_key <-  intersect(rownames(LFC1)[which(LFC1$log2FoldChange>=1)],
                       rownames(LFC1)[which(LFC1$padj<=0.05)])
  up_df <- as.data.frame((LFC1)[which(rownames(LFC1) %in% up_key),])
  up_df <- tibble::rownames_to_column(up_df, "external_gene_name")
  info <- getBM(attributes=c("external_gene_name",
                             "chromosome_name",
                             "start_position",
                             "end_position",
                             "strand",
                             "entrezgene_description"),
                        filters = c("external_gene_name"),
                        values = up_df$external_gene_name,
                        mart = mart,
                        useCache = F)
  tmp <- merge(info, up_df, by="external_gene_name")
  tmp <- tmp[order(-tmp$log2FoldChange),]
  ## col out index
  index <- c(1, 2, 3, 4, 5, 6, 8, 10, 11)
  tmp <- tmp[,index]
  colnames(tmp) <- output_col
  tmp$Strand <- gsub("-1", "-", tmp$Strand)
  tmp$Strand <- gsub("1", "+", tmp$Strand)
  # remove dups
  tmp <- tmp[!duplicated(tmp$Gene),]
  up_regulated_table1 <- tmp


  
DT::datatable(up_regulated_table1, rownames = FALSE)

## Clone 1 vs Control upregulated
write.table(up_regulated_table1, "~/Desktop/GSE118959/results/RNA-Seq/DESeq2/Clone1_vs_Control_UP.txt", sep="\t")

Down Regulated

res1 <- results(dds, filterFun=ihw, alpha=0.05, c("Condition", "Clone1", "Control"))
LFC1 <- lfcShrink(dds = dds, res= res1, coef = 4, type = "apeglm")

output_col <- c("Gene", "Chromosome", "Start", "Stop", "Strand", "Description", "Log2FC", "P-value", "Adj P-value")

  down_key <- intersect(rownames(LFC1)[which(LFC1$log2FoldChange<=-1)],
                        rownames(LFC1)[which(LFC1$padj<=0.05)])
  
  down_df <- as.data.frame((LFC1)[which(rownames(LFC1) %in% down_key),])
  down_df <- tibble::rownames_to_column(down_df, "external_gene_name")
  
  info <- getBM(attributes=c("external_gene_name",
                             "chromosome_name",
                             "start_position",
                             "end_position",
                             "strand",
                             "entrezgene_description"),
                         filters = c("external_gene_name"),
                         values = down_df$external_gene_name,
                         mart = mart,
                         useCache = F)
  
  tmp <- merge(info, down_df, by="external_gene_name")
  tmp <- tmp[order(tmp$log2FoldChange),]
  ## col out index
  index <- c(1, 2, 3, 4, 5, 6, 8, 10, 11)
  tmp <- tmp[,index]
  colnames(tmp) <- output_col
  tmp$Strand <- gsub("-1", "-", tmp$Strand)
  tmp$Strand <- gsub("1", "+", tmp$Strand)
  # remove dups
  tmp <- tmp[!duplicated(tmp$Gene),]
  down_regulated_table1 <- tmp

  
DT::datatable(down_regulated_table1, rownames=FALSE)

## Clone 1 vs Control down regulated
write.table(down_regulated_table1, "~/Desktop/GSE118959/results/RNA-Seq/DESeq2/Clone1_vs_Control_DOWN.txt", sep="\t")

Volcano Plot

EnhancedVolcano(LFC1_df,
    lab = rownames(LFC1_df),
    x = 'log2FoldChange',
    y = 'pvalue',
    selectLab = c(""),
    FCcutoff = 1.0,
    pCutoff = 0.05,
    title = "Clone1 vs Control DESeq2 Results",
    subtitle = "Differentially Expressed Genes",
    caption = paste0('Total Genes = ', nrow(LFC1_df)),
    legendPosition = "right", 
    legend = c("Not Sig", "+/- 1 Log2 FC", "P-val <0.05", "P-val & Log2 FC"),
    legendVisible = F,
    xlim = c(-10, 10),
    ylim = c(0, 100),
    transcriptPointSize = 1.5)

Heatmap

c1heat <- c(up_regulated_table1$Gene, down_regulated_table1$Gene)
## subset log2 for viz
log_counts <- as.data.frame(log_counts)
# subset log counts
c1heat_df <- subset(log_counts, rownames(log_counts) %in% c1heat)
# remove clone9
c1heat_df <- c1heat_df[,-c(4,5,6)]
## flip it so it reads left to right
c1heat_df <- c1heat_df[,c(4,5,6,1,2,3)]
## remove no SD between cases
c1heat_df <- c1heat_df[apply(c1heat_df, MARGIN = 1, FUN = function(x) sd(x) != 0),]
## calculate zscore for viz
mat1 <- as.matrix(c1heat_df)
mat1 <- t(mat1)
mat1 <- scale(mat1, center=T)
mat1 <- t(mat1)

# Top annotation
df2 = data.frame(Cell_Line = c(rep("Control", 3), rep("Clone1", 3)))
ha = HeatmapAnnotation(df = df2, col = list(Cell_Line = c("Control" =  "black", "Clone1" = "gold")), annotation_legend_param = list(title_gp = gpar(fontsize = 12, fontface = "bold"), labels_gp = gpar(fontsize = 12)))             

hm1 <- Heatmap(mat1, 
              col= colorRamp2(c(-2,-1,0,1,2),c("blue","skyblue","white","lightcoral","red")),
              heatmap_legend_param=list(at=c(-2,-1,0,1,2),color_bar="continuous", 
                                        legend_direction="vertical", legend_width=unit(15,"cm"),
                                        title_position="topcenter", title_gp=gpar(fontsize=10, fontface="bold")), 
              name = "Z-score",
              
              #Row annotation configurations
              cluster_rows=TRUE,
              show_row_dend=TRUE,
              row_title_side="left",
              row_title_gp=gpar(fontsize=8),
              show_row_names=FALSE,
              row_names_side="right",
              
              #Column annotation configuratiions
              cluster_columns=TRUE,
              show_column_dend=TRUE,
              column_title="Differentially Expressed Genes\nClone1 vs Control",
              column_title_side="top",
              column_title_gp=gpar(fontsize=18, fontface="bold"),
              show_column_names = FALSE,
              column_names_gp = gpar(fontsize = 20, fontface="bold"),
              
              #Dendrogram configurations: columns
              clustering_distance_columns="euclidean",
              clustering_method_columns="complete",
              column_dend_height=unit(10,"mm"),

              #Dendrogram configurations: rows
              clustering_distance_rows="euclidean",
              clustering_method_rows="complete",
              row_dend_width=unit(4,"cm"),
              row_dend_side = "left",
              row_dend_reorder = TRUE,

              #Splits
              border=T,
              row_km = 1,
              column_km = 2,
              
              #plot params
              width = unit(5, "inch"), 
              height = unit(4, "inch"),
              #height = unit(0.4, "cm")*nrow(mat),
              
              #Annotations
              top_annotation = ha)


draw(hm1, annotation_legend_side = "right", heatmap_legend_side="right")

Clone9 vs Control

Up Regulated

res9 <- results(dds, filterFun=ihw, alpha=0.05, c("Condition", "Clone9", "Control"))
LFC9 <- lfcShrink(dds = dds, res= res9, coef = 5, type = "apeglm")
LFC9_df <- as.data.frame(LFC9)

output_col <- c("Gene", "Chromosome", "Start", "Stop", "Strand", "Description", "Log2FC", "P-value", "Adj P-value")

  up_key <-  intersect(rownames(LFC9)[which(LFC9$log2FoldChange>=1)],
                       rownames(LFC9)[which(LFC9$padj<=0.05)])
  up_df <- as.data.frame((LFC9)[which(rownames(LFC9) %in% up_key),])
  up_df <- tibble::rownames_to_column(up_df, "external_gene_name")
  info <- getBM(attributes=c("external_gene_name",
                             "chromosome_name",
                             "start_position",
                             "end_position",
                             "strand",
                             "entrezgene_description"),
                        filters = c("external_gene_name"),
                        values = up_df$external_gene_name,
                        mart = mart,
                        useCache = F)
  tmp <- merge(info, up_df, by="external_gene_name")
  tmp <- tmp[order(-tmp$log2FoldChange),]
  ## col out index
  index <- c(1, 2, 3, 4, 5, 6, 8, 10, 11)
  tmp <- tmp[,index]
  colnames(tmp) <- output_col
  tmp$Strand <- gsub("-1", "-", tmp$Strand)
  tmp$Strand <- gsub("1", "+", tmp$Strand)
  # remove dups
  tmp <- tmp[!duplicated(tmp$Gene),]
  up_regulated_table9 <- tmp


DT::datatable(up_regulated_table9, rownames = FALSE)

## Clone 9 vs Control upregulated 
write.table(up_regulated_table9, "~/Desktop/GSE118959/results/RNA-Seq/DESeq2/Clone9_vs_Control_UP.txt", sep="\t")

Down Regulated

res9 <- results(dds, filterFun=ihw, alpha=0.05, c("Condition", "Clone9", "Control"))
LFC9 <- lfcShrink(dds = dds, res= res9, coef = 5, type = "apeglm")
output_col <- c("Gene", "Chromosome", "Start", "Stop", "Strand", "Description", "Log2FC", "P-value", "Adj P-value")

  down_key <- intersect(rownames(LFC9)[which(LFC9$log2FoldChange<=-1)],
                        rownames(LFC9)[which(LFC9$padj<=0.05)])
  
  down_df <- as.data.frame((LFC9)[which(rownames(LFC9) %in% down_key),])
  down_df <- tibble::rownames_to_column(down_df, "external_gene_name")
  
  info <- getBM(attributes=c("external_gene_name",
                             "chromosome_name",
                             "start_position",
                             "end_position",
                             "strand",
                             "entrezgene_description"),
                         filters = c("external_gene_name"),
                         values = down_df$external_gene_name,
                         mart = mart,
                         useCache=F)
  
  tmp <- merge(info, down_df, by="external_gene_name")
  tmp <- tmp[order(tmp$log2FoldChange),]
  ## col out index
  index <- c(1, 2, 3, 4, 5, 6, 8, 10, 11)
  tmp <- tmp[,index]
  colnames(tmp) <- output_col
  tmp$Strand <- gsub("-1", "-", tmp$Strand)
  tmp$Strand <- gsub("1", "+", tmp$Strand)
  # remove dups
  tmp <- tmp[!duplicated(tmp$Gene),]
  down_regulated_table9 <- tmp


DT::datatable(down_regulated_table9, rownames=FALSE)

## Clone 9 vs Control downregulated 
write.table(down_regulated_table9, "~/Desktop/GSE118959/results/RNA-Seq/DESeq2/Clone9_vs_Control_DOWN.txt", sep="\t")

Volcano Plot

EnhancedVolcano(LFC9_df,
    lab = rownames(LFC9_df),
    x = 'log2FoldChange',
    y = 'pvalue',
    selectLab = c(""),
    FCcutoff = 1.0,
    pCutoff = 0.05,
    title = "Clone9 vs Control DESeq2 Results",
    subtitle = "Differentially Expressed Genes",
    caption = paste0('Total Genes = ', nrow(LFC9_df)),
    legendPosition = "right", 
    legend = c("Not Sig", "+/- 1 Log2 FC", "P-val <0.05", "P-val & Log2 FC"),
    legendVisible = F,
    xlim = c(-8, 8),
    ylim = c(0, 75),
    transcriptPointSize = 1.5)

Heatmap

c9heat <- c(up_regulated_table9$Gene, down_regulated_table9$Gene)
## subset log2 for viz
log_counts <- as.data.frame(log_counts)
# subset log counts
c9heat_df <- subset(log_counts, rownames(log_counts) %in% c9heat)
# remove clone1
c9heat_df <- c9heat_df[,-c(1,2,3)]
## flip it so it reads left to right
c9heat_df <- c9heat_df[,c(4,5,6,1,2,3)]
## remove no SD between cases
c9heat_df <- c9heat_df[apply(c9heat_df, MARGIN = 1, FUN = function(x) sd(x) != 0),]
## calculate zscore for viz
mat9 <- as.matrix(c9heat_df)
mat9 <- t(mat9)
mat9 <- scale(mat9, center=T)
mat9 <- t(mat9)

# Top annotation
df2 = data.frame(Cell_Line = c(rep("Control", 3), rep("Clone9", 3)))
ha = HeatmapAnnotation(df = df2, col = list(Cell_Line = c("Control" =  "black", "Clone9" = "gold")), annotation_legend_param = list(title_gp = gpar(fontsize = 12, fontface = "bold"), labels_gp = gpar(fontsize = 12)))             

hm9 <- Heatmap(mat9, 
              col= colorRamp2(c(-2,-1,0,1,2),c("blue","skyblue","white","lightcoral","red")),
              heatmap_legend_param=list(at=c(-2,-1,0,1,2),color_bar="continuous", 
                                        legend_direction="vertical", legend_width=unit(15,"cm"),
                                        title_position="topcenter", title_gp=gpar(fontsize=10, fontface="bold")), 
              name = "Z-score",
              
              #Row annotation configurations
              cluster_rows=TRUE,
              show_row_dend=TRUE,
              row_title_side="left",
              row_title_gp=gpar(fontsize=8),
              show_row_names=FALSE,
              row_names_side="right",
              
              #Column annotation configuratiions
              cluster_columns=TRUE,
              show_column_dend=TRUE,
              column_title="Differentially Expressed Genes\nClone9 vs Control",
              column_title_side="top",
              column_title_gp=gpar(fontsize=18, fontface="bold"),
              show_column_names = FALSE,
              column_names_gp = gpar(fontsize = 20, fontface="bold"),
              
              #Dendrogram configurations: columns
              clustering_distance_columns="euclidean",
              clustering_method_columns="complete",
              column_dend_height=unit(10,"mm"),

              #Dendrogram configurations: rows
              clustering_distance_rows="euclidean",
              clustering_method_rows="complete",
              row_dend_width=unit(4,"cm"),
              row_dend_side = "left",
              row_dend_reorder = TRUE,

              #Splits
              border=T,
              row_km = 1,
              column_km = 2,
              
              #plot params
              width = unit(5, "inch"), 
              height = unit(4, "inch"),
              #height = unit(0.4, "cm")*nrow(mat),
              
              #Annotations
              top_annotation = ha)


draw(hm9, annotation_legend_side = "right", heatmap_legend_side="right")

Clone1 vs Clone9

Up Regulated

dds19 <- DESeqDataSetFromMatrix(mtx, colData = sample_df, design = ~ Replicate + Condition)
dds19$Condition <- relevel(dds19$Condition, ref = "Clone9")
keep <- rowSums(counts(dds19)) >= 20
dds19 <- dds19[keep,]
dds19 <- DESeq(dds19) 

res19 <- results(dds19, filterFun=ihw, alpha=0.05, c("Condition", "Clone1", "Clone9"))
LFC19 <- lfcShrink(dds = dds19, res= res19, coef = 4, type = "apeglm")
LFC19_df <- as.data.frame(LFC19)

output_col <- c("Gene", "Chromosome", "Start", "Stop", "Strand", "Description", "Log2FC", "P-value", "Adj P-value")

  up_key <-  intersect(rownames(LFC19)[which(LFC19$log2FoldChange>=1)],
                       rownames(LFC19)[which(LFC19$padj<=0.05)])
  up_df <- as.data.frame((LFC19)[which(rownames(LFC19) %in% up_key),])
  up_df <- tibble::rownames_to_column(up_df, "external_gene_name")
  info <- getBM(attributes=c("external_gene_name",
                             "chromosome_name",
                             "start_position",
                             "end_position",
                             "strand",
                             "entrezgene_description"),
                        filters = c("external_gene_name"),
                        values = up_df$external_gene_name,
                        mart = mart,
                        useCache = F)
  tmp <- merge(info, up_df, by="external_gene_name")
  tmp <- tmp[order(-tmp$log2FoldChange),]
  ## col out index
  index <- c(1, 2, 3, 4, 5, 6, 8, 10, 11)
  tmp <- tmp[,index]
  colnames(tmp) <- output_col
  tmp$Strand <- gsub("-1", "-", tmp$Strand)
  tmp$Strand <- gsub("1", "+", tmp$Strand)
  # remove dups
  tmp <- tmp[!duplicated(tmp$Gene),]
  up_regulated_table19 <- tmp

DT::datatable(up_regulated_table19, rownames = FALSE)

## Clone 1 vs Clone 9 upregulated 
write.table(up_regulated_table19, "~/Desktop/GSE118959/results/RNA-Seq/DESeq2/Clone1_vs_Clone9_UP.txt", sep="\t")

Down Regulated

res19 <- results(dds19, filterFun=ihw, alpha=0.05, c("Condition", "Clone1", "Clone9"))
LFC19 <- lfcShrink(dds = dds19, res= res19, coef = 4, type = "apeglm")

output_col <- c("Gene", "Chromosome", "Start", "Stop", "Strand", "Description", "Log2FC", "P-value", "Adj P-value")

  down_key <- intersect(rownames(LFC19)[which(LFC19$log2FoldChange<=-1)],
                        rownames(LFC19)[which(LFC19$padj<=0.05)])
  
  down_df <- as.data.frame((LFC19)[which(rownames(LFC19) %in% down_key),])
  down_df <- tibble::rownames_to_column(down_df, "external_gene_name")
  
  info <- getBM(attributes=c("external_gene_name",
                             "chromosome_name",
                             "start_position",
                             "end_position",
                             "strand",
                             "entrezgene_description"),
                         filters = c("external_gene_name"),
                         values = down_df$external_gene_name,
                         mart = mart,
                         useCache = F)
  
  tmp <- merge(info, down_df, by="external_gene_name")
  tmp <- tmp[order(tmp$log2FoldChange),]
  ## col out index
  index <- c(1, 2, 3, 4, 5, 6, 8, 10, 11)
  tmp <- tmp[,index]
  colnames(tmp) <- output_col
  tmp$Strand <- gsub("-1", "-", tmp$Strand)
  tmp$Strand <- gsub("1", "+", tmp$Strand)
  # remove dups
  tmp <- tmp[!duplicated(tmp$Gene),]
  down_regulated_table19 <- tmp

DT::datatable(down_regulated_table19, rownames=FALSE)

## Clone 1 vs Clone 9 downregulated 
write.table(down_regulated_table19, "~/Desktop/GSE118959/results/RNA-Seq/DESeq2/Clone1_vs_Clone9_DOWN.txt", sep="\t")

Volcano Plot

EnhancedVolcano(LFC19_df,
    lab = rownames(LFC19_df),
    x = 'log2FoldChange',
    y = 'pvalue',
    selectLab = c(""),
    FCcutoff = 1.0,
    pCutoff = 0.05,
    title = "Clone1 vs Clone9 DESeq2 Results",
    subtitle = "Differentially Expressed Genes",
    caption = paste0('Total Genes = ', nrow(LFC19_df)),
    legendPosition = "right", 
    legend = c("Not Sig", "+/- 1 Log2 FC", "P-val <0.05", "P-val & Log2 FC"),
    legendVisible = F,
    xlim = c(-10, 10),
    ylim = c(0, 100),
    transcriptPointSize = 1.5)

Heatmap

c19heat <- c(up_regulated_table19$Gene, down_regulated_table19$Gene)
## subset log2 for viz
log_counts <- as.data.frame(log_counts)
# subset log counts
c19heat_df <- subset(log_counts, rownames(log_counts) %in% c19heat)
# remove control
c19heat_df <- c19heat_df[,-c(7,8,9)]
## flip it so it reads left to right
c19heat_df <- c19heat_df[,c(4,5,6,1,2,3)]
## remove no SD between cases
c19heat_df <- c19heat_df[apply(c19heat_df, MARGIN = 1, FUN = function(x) sd(x) != 0),]
## calculate zscore for viz
mat19 <- as.matrix(c19heat_df)
mat19 <- t(mat19)
mat19 <- scale(mat19, center=T)
mat19 <- t(mat19)

# Top annotation
df2 = data.frame(Cell_Line = c(rep("Clone9", 3), rep("Clone1", 3)))
ha = HeatmapAnnotation(df = df2, col = list(Cell_Line = c("Clone9" =  "black", "Clone1" = "gold")), annotation_legend_param = list(title_gp = gpar(fontsize = 12, fontface = "bold"), labels_gp = gpar(fontsize = 12)))             

hm19 <- Heatmap(mat19, 
              col= colorRamp2(c(-2,-1,0,1,2),c("blue","skyblue","white","lightcoral","red")),
              heatmap_legend_param=list(at=c(-2,-1,0,1,2),color_bar="continuous", 
                                        legend_direction="vertical", legend_width=unit(15,"cm"),
                                        title_position="topcenter", title_gp=gpar(fontsize=10, fontface="bold")), 
              name = "Z-score",
              
              #Row annotation configurations
              cluster_rows=TRUE,
              show_row_dend=TRUE,
              row_title_side="left",
              row_title_gp=gpar(fontsize=8),
              show_row_names=FALSE,
              row_names_side="right",
              
              #Column annotation configuratiions
              cluster_columns=TRUE,
              show_column_dend=TRUE,
              column_title="Differentially Expressed Genes\nClone1 vs Clone9",
              column_title_side="top",
              column_title_gp=gpar(fontsize=18, fontface="bold"),
              show_column_names = FALSE,
              column_names_gp = gpar(fontsize = 20, fontface="bold"),
              
              #Dendrogram configurations: columns
              clustering_distance_columns="euclidean",
              clustering_method_columns="complete",
              column_dend_height=unit(10,"mm"),

              #Dendrogram configurations: rows
              clustering_distance_rows="euclidean",
              clustering_method_rows="complete",
              row_dend_width=unit(4,"cm"),
              row_dend_side = "left",
              row_dend_reorder = TRUE,

              #Splits
              border=T,
              row_km = 1,
              column_km = 2,
              
              #plot params
              width = unit(5, "inch"), 
              height = unit(4, "inch"),
              #height = unit(0.4, "cm")*nrow(mat),
              
              #Annotations
              top_annotation = ha)


draw(hm19, annotation_legend_side = "right", heatmap_legend_side="right")

GSEA

Gene Set Enrichment Analysis (GSEA) was performed using GSEA software, a joint project of UC San Diego and Broad Institute. The dataset was normalised using DESeq2 median of ratios method and parsed to .gct format using custom bash scripts as input to GSEA. Parameters that deviate from the default setings are described below:

Permutation Type: Gene_Set was chosen for this parameter as the number of samples per phenotype was below 7, thus phenotype shuffling is not feasible for this option.
Metric For Ranking Genes: T-test uses the difference of means scaled by the standard deviation and number of samples, as opposed to the default Signal-to-noise metric which uses the difference of means scaled by the sandard deviation, without taking into account the number of samples.

Hallmarks

Hallmark gene sets summarize and represent specific well-defined biological states or processes and display coherent expression. These gene sets were generated by a computational methodology based on identifying overlaps between gene sets in other MSigDB collections and retaining genes that display coordinated expression.

Clone1 vs Control

Enriched Pathways Clone1

tmp <- read.csv("/home/barry/Desktop/GSE118959/results/GSEA/Hallmarks_R/H_Clone1_vs_Control_Clone1.csv", sep="\t", header=T)

tmp <- subset(tmp, tmp$NOM.p.val < 0.05)

H_clone1_vs_control_clone1 <- tmp[,c(1,6,7)]

DT::datatable(H_clone1_vs_control_clone1, rownames=FALSE)

Enriched Pathways Control

tmp <- read.csv("/home/barry/Desktop/GSE118959/results/GSEA/Hallmarks_R/H_Clone1_vs_Control_Control.csv", sep="\t", header=T)

tmp <- subset(tmp, tmp$NOM.p.val < 0.05)

H_clone1_vs_control_control <- tmp[,c(1,6,7)]

DT::datatable(H_clone1_vs_control_control, rownames=FALSE)

Enriched Pathway Plots

hallmarks1 <- rbind(H_clone1_vs_control_clone1, H_clone1_vs_control_control)
hallmarks1$p.value <- hallmarks1$NOM.p.val
hallmarks1 <- hallmarks1[,c(1,2,4)]

pathway_vec <- c()
for(i in hallmarks1$NAME){
  list <- strsplit(i, "_")
  new <- paste(list[[1]][-1], collapse="_")
  pathway_vec <- c(pathway_vec, new)
}

hallmarks1$NAME <- pathway_vec

hallmarks1$Enrichment <- ifelse(hallmarks1$NES < 0, "Control", "Clone1")
ggplot(hallmarks1, aes(reorder(NAME, NES), NES)) +
  geom_col(stat="identity",position="identity",aes(fill = Enrichment)) +
  scale_fill_manual(values=c(Clone1="skyblue",Control="indianred1")) + 
  coord_flip() +
  labs(x="Pathway", y="Normalized Enrichment Score",
       title="Hallmark pathways NES from GSEA",
       subtitle = "Clone1 vs Control") + 
  theme(plot.title = element_text(hjust = 0.5),  
        plot.subtitle = element_text(hjust = 0.5)) +
  theme_minimal()

## Warning: Ignoring unknown parameters: stat

Clone9 vs Control

Enriched Pathways Clone9

tmp <- read.csv("/home/barry/Desktop/GSE118959/results/GSEA/Hallmarks_R/H_Clone9_vs_Control_Clone9.csv", sep="\t", header=T)

tmp <- subset(tmp, tmp$NOM.p.val < 0.05)

H_clone9_vs_control_clone9 <- tmp[,c(1,6,7)]

DT::datatable(H_clone9_vs_control_clone9, rownames=FALSE)

Enriched Pathways Control

tmp <- read.csv("/home/barry/Desktop/GSE118959/results/GSEA/Hallmarks_R/H_Clone9_vs_Control_Control.csv", sep="\t", header=T)

tmp <- subset(tmp, tmp$NOM.p.val < 0.05)

H_clone9_vs_control_control <- tmp[,c(1,6,7)]

DT::datatable(H_clone9_vs_control_control, rownames=FALSE)

Enriched Pathway Plots

hallmarks9 <- rbind(H_clone9_vs_control_clone9, H_clone9_vs_control_control)
hallmarks9$p.value <- hallmarks9$NOM.p.val
hallmarks9 <- hallmarks9[,c(1,2,4)]

pathway_vec <- c()
for(i in hallmarks9$NAME){
  list <- strsplit(i, "_")
  new <- paste(list[[1]][-1], collapse="_")
  pathway_vec <- c(pathway_vec, new)
}

hallmarks9$NAME <- pathway_vec

hallmarks9$Enrichment <- ifelse(hallmarks9$NES < 0, "Control", "Clone9")
ggplot(hallmarks9, aes(reorder(NAME, NES), NES)) +
  geom_col(stat="identity",position="identity",aes(fill = Enrichment)) +
  scale_fill_manual(values=c(Clone9="skyblue",Control="indianred1")) + 
  coord_flip() +
  labs(x="Pathway", y="Normalized Enrichment Score",
       title="Hallmark pathways NES from GSEA",
       subtitle = "Clone9 vs Control") + 
  theme(plot.title = element_text(hjust = 0.5),  
        plot.subtitle = element_text(hjust = 0.5)) +
  theme_minimal()

## Warning: Ignoring unknown parameters: stat

Clone1 vs Clone9

Enriched Pathways Clone1

tmp <- read.csv("/home/barry/Desktop/GSE118959/results/GSEA/Hallmarks_R/H_Clone1_vs_Clone9_Clone1.csv", sep="\t", header=T)

tmp <- subset(tmp, tmp$NOM.p.val < 0.05)

H_clone1_vs_clone9_clone1 <- tmp[,c(1,6,7)]

DT::datatable(H_clone1_vs_clone9_clone1, rownames=FALSE)

Enriched Pathways Clone9

tmp <- read.csv("/home/barry/Desktop/GSE118959/results/GSEA/Hallmarks_R/H_Clone1_vs_Clone9_Clone9.csv", sep="\t", header=T)

tmp <- subset(tmp, tmp$NOM.p.val < 0.05)

H_clone1_vs_clone9_clone9 <- tmp[,c(1,6,7)]

DT::datatable(H_clone1_vs_clone9_clone9, rownames=FALSE)

Enriched Pathway Plot

hallmarks19 <- rbind(H_clone1_vs_clone9_clone1, H_clone1_vs_clone9_clone9)
hallmarks19$p.value <- hallmarks19$NOM.p.val
hallmarks19 <- hallmarks19[,c(1,2,4)]

pathway_vec <- c()
for(i in hallmarks19$NAME){
  list <- strsplit(i, "_")
  new <- paste(list[[1]][-1], collapse="_")
  pathway_vec <- c(pathway_vec, new)
}

hallmarks19$NAME <- pathway_vec

hallmarks19$Enrichment <- ifelse(hallmarks19$NES < 0, "Clone9", "Clone1")
ggplot(hallmarks19, aes(reorder(NAME, NES), NES)) +
  geom_col(stat="identity",position="identity",aes(fill = Enrichment)) +
  scale_fill_manual(values=c(Clone1="skyblue",Clone9="indianred1")) + 
  coord_flip() +
  labs(x="Pathway", y="Normalized Enrichment Score",
       title="Hallmark pathways NES from GSEA",
       subtitle = "Clone1 vs Clone9") + 
  theme(plot.title = element_text(hjust = 0.5),  
        plot.subtitle = element_text(hjust = 0.5)) +
  theme_minimal()

## Warning: Ignoring unknown parameters: stat

KEGG

The Kyoto Encyclopedia of Genes and Genomes (KEGG) database is a resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies.

Clone1 vs Control

Enriched Pathways Clone1

tmp <- read.csv("/home/barry/Desktop/GSE118959/results/GSEA/KEGG_R/K_Clone1_vs_Control_Clone1.csv", sep="\t", header=T)

tmp <- subset(tmp, tmp$NOM.p.val < 0.05)

K_clone1_vs_control_clone1 <- tmp[,c(1,6,7)]

DT::datatable(K_clone1_vs_control_clone1, rownames=FALSE)

Enriched Pathways Control

tmp <- read.csv("/home/barry/Desktop/GSE118959/results/GSEA/KEGG_R/K_Clone1_vs_Control_Control.csv", sep="\t", header=T)

tmp <- subset(tmp, tmp$NOM.p.val < 0.05)

K_clone1_vs_control_control <- tmp[,c(1,6,7)]

DT::datatable(K_clone1_vs_control_control, rownames=FALSE)

Enriched Pathway Plots

kegg1 <- rbind(K_clone1_vs_control_clone1, K_clone1_vs_control_control)
kegg1$p.value <- kegg1$NOM.p.val
kegg1 <- kegg1[,c(1,2,4)]

pathway_vec <- c()
for(i in kegg1$NAME){
  list <- strsplit(i, "_")
  new <- paste(list[[1]][-1], collapse="_")
  pathway_vec <- c(pathway_vec, new)
}

kegg1$NAME <- pathway_vec

kegg1$Enrichment <- ifelse(kegg1$NES < 0, "Control", "Clone1")
ggplot(kegg1, aes(reorder(NAME, NES), NES)) +
  geom_col(stat="identity",position="identity",aes(fill = Enrichment)) +
  scale_fill_manual(values=c(Clone1="skyblue",Control="indianred1")) + 
  coord_flip() +
  labs(x="Pathway", y="Normalized Enrichment Score",
       title="KEGG pathways NES from GSEA",
       subtitle = "Clone1 vs Control") + 
  theme(plot.title = element_text(hjust = 0.5),  
        plot.subtitle = element_text(hjust = 0.5)) +
  theme_minimal()

## Warning: Ignoring unknown parameters: stat

Clone9 vs Control

Enriched Pathways Clone9

tmp <- read.csv("/home/barry/Desktop/GSE118959/results/GSEA/KEGG_R/K_Clone9_vs_Control_Clone9.csv", sep="\t", header=T)

tmp <- subset(tmp, tmp$NOM.p.val < 0.05)

K_clone9_vs_control_clone9 <- tmp[,c(1,6,7)]

DT::datatable(K_clone9_vs_control_clone9, rownames=FALSE)

Enriched Pathways Control

tmp <- read.csv("/home/barry/Desktop/GSE118959/results/GSEA/KEGG_R/K_Clone9_vs_Control_Control.csv", sep="\t", header=T)

tmp <- subset(tmp, tmp$NOM.p.val < 0.05)

K_clone9_vs_control_control <- tmp[,c(1,6,7)]

DT::datatable(K_clone9_vs_control_control, rownames=FALSE)

Enriched Pathway Plots

kegg9 <- rbind(K_clone9_vs_control_clone9, K_clone9_vs_control_control)
kegg9$p.value <- kegg9$NOM.p.val
kegg9 <- kegg9[,c(1,2,4)]

pathway_vec <- c()
for(i in kegg9$NAME){
  list <- strsplit(i, "_")
  new <- paste(list[[1]][-1], collapse="_")
  pathway_vec <- c(pathway_vec, new)
}

kegg9$NAME <- pathway_vec

kegg9$Enrichment <- ifelse(kegg9$NES < 0, "Control", "Clone9")
ggplot(kegg9, aes(reorder(NAME, NES), NES)) +
  geom_col(stat="identity",position="identity",aes(fill = Enrichment)) +
  scale_fill_manual(values=c(Clone9="skyblue",Control="indianred1")) + 
  coord_flip() +
  labs(x="Pathway", y="Normalized Enrichment Score",
       title="KEGG pathways NES from GSEA",
       subtitle = "Clone9 vs Control") + 
  theme(plot.title = element_text(hjust = 0.5),  
        plot.subtitle = element_text(hjust = 0.5)) +
  theme_minimal()

## Warning: Ignoring unknown parameters: stat

Clone1 vs Clone9

Enriched Pathways Clone1

tmp <- read.csv("/home/barry/Desktop/GSE118959/results/GSEA/KEGG_R/K_Clone1_vs_Clone9_Clone1.csv", sep="\t", header=T)

tmp <- subset(tmp, tmp$NOM.p.val < 0.05)

K_clone1_vs_clone9_clone1 <- tmp[,c(1,6,7)]

DT::datatable(K_clone1_vs_clone9_clone1, rownames=FALSE)

Enriched Pathways Clone9

tmp <- read.csv("/home/barry/Desktop/GSE118959/results/GSEA/KEGG_R/K_Clone1_vs_Clone9_Clone9.csv", sep="\t", header=T)

tmp <- subset(tmp, tmp$NOM.p.val < 0.05)

K_clone1_vs_clone9_clone9 <- tmp[,c(1,6,7)]

DT::datatable(K_clone1_vs_clone9_clone9, rownames=FALSE)

Enriched Pathway Plots

kegg19 <- rbind(K_clone1_vs_clone9_clone1, K_clone1_vs_clone9_clone9)
kegg19$p.value <- kegg19$NOM.p.val
kegg19 <- kegg19[,c(1,2,4)]

pathway_vec <- c()
for(i in kegg19$NAME){
  list <- strsplit(i, "_")
  new <- paste(list[[1]][-1], collapse="_")
  pathway_vec <- c(pathway_vec, new)
}

kegg19$NAME <- pathway_vec

kegg19$Enrichment <- ifelse(kegg19$NES < 0, "Clone9", "Clone1")
g <- ggplot(kegg19, aes(reorder(NAME, NES), NES)) +
  geom_col(stat="identity",position="identity",aes(fill = Enrichment)) +
  scale_fill_manual(values=c(Clone1="skyblue",Clone9="indianred1")) + 
  coord_flip() +
  labs(x="Pathway", y="Normalized Enrichment Score",
       title="KEGG pathways NES from GSEA",
       subtitle = "Clone1 vs Clone9") + 
  theme(plot.title = element_text(hjust = 0.5),  
        plot.subtitle = element_text(hjust = 0.5)) +
  theme_minimal()

## Warning: Ignoring unknown parameters: stat

plot(g)

GO Biological Process

The larger processes, or ‘biological programs’ accomplished by multiple molecular activities. Examples of broad biological process terms are DNA repair or signal transduction. Note that a biological process is not equivalent to a pathway. At present, the GO does not try to represent the dynamics or dependencies that would be required to fully describe a pathway.

Marvin, due to the large pathway names, enriched pathway plots have been omitted from this section. If you wish to create enriched pathway plots similar to the previous sections contact me with the pathways you would like displayed and I will create them for you.

Clone1 vs Control

Enriched Pathways Clone1

tmp <- read.csv("/home/barry/Desktop/GSE118959/results/GSEA/BP_R/BP_Clone1_vs_Control_Clone1.csv", sep="\t", header=T)

tmp <- subset(tmp, tmp$NOM.p.val < 0.05)

BP_clone1_vs_control_clone1 <- tmp[,c(1,6,7)]

DT::datatable(BP_clone1_vs_control_clone1, rownames=FALSE)

Enriched Pathways Control

tmp <- read.csv("/home/barry/Desktop/GSE118959/results/GSEA/BP_R/BP_Clone1_vs_Control_Control.csv", sep="\t", header=T)

tmp <- subset(tmp, tmp$NOM.p.val < 0.05)

BP_clone1_vs_control_control <- tmp[,c(1,6,7)]

DT::datatable(BP_clone1_vs_control_control, rownames=FALSE)

Clone9 vs Control

Enriched Pathways Clone9

tmp <- read.csv("/home/barry/Desktop/GSE118959/results/GSEA/BP_R/BP_Clone9_vs_Control_Clone9.csv", sep="\t", header=T)

tmp <- subset(tmp, tmp$NOM.p.val < 0.05)

BP_clone9_vs_control_clone9 <- tmp[,c(1,6,7)]

DT::datatable(BP_clone9_vs_control_clone9, rownames=FALSE)

Enriched Pathways Control

tmp <- read.csv("/home/barry/Desktop/GSE118959/results/GSEA/BP_R/BP_Clone9_vs_Control_Control.csv", sep="\t", header=T)

tmp <- subset(tmp, tmp$NOM.p.val < 0.05)

BP_clone9_vs_control_control <- tmp[,c(1,6,7)]

DT::datatable(BP_clone9_vs_control_control, rownames=FALSE)

Clone1 vs Clone9

Enriched Pathways Clone1

tmp <- read.csv("/home/barry/Desktop/GSE118959/results/GSEA/BP_R/BP_Clone1_vs_Clone9_Clone1.csv", sep="\t", header=T)

tmp <- subset(tmp, tmp$NOM.p.val < 0.05)

BP_clone1_vs_clone9_clone1 <- tmp[,c(1,6,7)]

DT::datatable(BP_clone1_vs_clone9_clone1, rownames=FALSE)

Enriched Pathways Clone9

tmp <- read.csv("/home/barry/Desktop/GSE118959/results/GSEA/BP_R/BP_Clone1_vs_Clone9_Clone9.csv", sep="\t", header=T)

tmp <- subset(tmp, tmp$NOM.p.val < 0.05)

BP_clone1_vs_clone9_clone9 <- tmp[,c(1,6,7)]

DT::datatable(BP_clone1_vs_clone9_clone9, rownames=FALSE)

GO Molecular Function

Molecular-level activities performed by gene products. Molecular function terms describe activities that occur at the molecular level, such as “catalysis” or “transport”. GO molecular function terms represent activities rather than the entities (molecules or complexes) that perform the actions, and do not specify where, when, or in what context the action takes place. Molecular functions generally correspond to activities that can be performed by individual gene products (i.e. a protein or RNA), but some activities are performed by molecular complexes composed of multiple gene products.

Clone1 vs Control

Enriched Pathways Clone1

tmp <- read.csv("/home/barry/Desktop/GSE118959/results/GSEA/MF_R/MF_Clone1_vs_Control_Clone1.csv", sep="\t", header=T)

tmp <- subset(tmp, tmp$NOM.p.val < 0.05)

MF_clone1_vs_control_clone1 <- tmp[,c(1,6,7)]

DT::datatable(MF_clone1_vs_control_clone1, rownames=FALSE)

Enriched Pathways Control

tmp <- read.csv("/home/barry/Desktop/GSE118959/results/GSEA/MF_R/MF_Clone1_vs_Control_Control.csv", sep="\t", header=T)

tmp <- subset(tmp, tmp$NOM.p.val < 0.05)

MF_clone1_vs_control_control <- tmp[,c(1,6,7)]

DT::datatable(MF_clone1_vs_control_control, rownames=FALSE)

Clone9 vs Control

Enriched Pathways Clone9

tmp <- read.csv("/home/barry/Desktop/GSE118959/results/GSEA/MF_R/MF_Clone9_vs_Control_Clone9.csv", sep="\t", header=T)

tmp <- subset(tmp, tmp$NOM.p.val < 0.05)

MF_clone9_vs_control_clone1 <- tmp[,c(1,6,7)]

DT::datatable(MF_clone9_vs_control_clone1, rownames=FALSE)

Enriched Pathways Control

tmp <- read.csv("/home/barry/Desktop/GSE118959/results/GSEA/MF_R/MF_Clone9_vs_Control_Control.csv", sep="\t", header=T)

tmp <- subset(tmp, tmp$NOM.p.val < 0.05)

MF_clone9_vs_control_control <- tmp[,c(1,6,7)]

DT::datatable(MF_clone9_vs_control_control, rownames=FALSE)

Clone1 vs Clone9

Enriched Pathways Clone1

tmp <- read.csv("/home/barry/Desktop/GSE118959/results/GSEA/MF_R/MF_Clone1_vs_Clone9_Clone1.csv", sep="\t", header=T)

tmp <- subset(tmp, tmp$NOM.p.val < 0.05)

MF_clone1_vs_clone9_clone1 <- tmp[,c(1,6,7)]

DT::datatable(MF_clone1_vs_clone9_clone1, rownames=FALSE)

Enriched Pathways Clone9

tmp <- read.csv("/home/barry/Desktop/GSE118959/results/GSEA/MF_R/MF_Clone1_vs_Clone9_Clone9.csv", sep="\t", header=T)

tmp <- subset(tmp, tmp$NOM.p.val < 0.05)

MF_clone1_vs_clone9_clone9 <- tmp[,c(1,6,7)]

DT::datatable(MF_clone1_vs_clone9_clone9, rownames=FALSE)

Epithelial to Mesenchymal Pathway

Per request, the epithelial to mesenchymal pathway has been investigated. All 1184 EMT genes provided by dbEMT (http://www.dbemt.bioinfo-minzhao.org/download.cgi) were downloaded. Next, differentially expressed genes in each cell line comparison that are present in the dbEMT list are selected. This returns a statistically significant set of genes involved in the EMT pathway (adjusted p-value < 0.05). Heatmaps and gene tables are provided for each cell line comparison.

Clone1 vs Control

Heatmap

emt <- read.table("~/Desktop/GSE118959/data/dbEMT/EMT_symbols.txt", header=T)
## subset log2 for viz
log_counts <- as.data.frame(log_counts)
## what DE genes are involved (capture both up and down)
emt_merged1 <- rbind(up_regulated_table1, down_regulated_table1)
de_key <- subset(emt, emt$Gene %in% emt_merged1$Gene)
# subset log counts
emt_df1 <- subset(log_counts, rownames(log_counts) %in% de_key$Gene)
# remove clone9
emt_df1 <- emt_df1[,-c(4,5,6)]
## flip it so it reads left to right
emt_df1 <- emt_df1[,c(4,5,6,1,2,3)]
## remove no SD between cases sanity check
emt_df1 <- emt_df1[apply(emt_df1, MARGIN = 1, FUN = function(x) sd(x) != 0),]

emt_mat1 <- as.matrix(emt_df1)
emt_mat1 <- t(emt_mat1)
emt_mat1 <- scale(emt_mat1, center=T)
emt_mat1 <- t(emt_mat1)
# color scheme
#f1 = colorRamp2(c(-2,-1,0,1,2),c("blue","skyblue","white","lightcoral","red"))
# Top annotation
df2 = data.frame(Cell_Line = c(rep("Control", 3), rep("Clone1", 3)))
ha = HeatmapAnnotation(df = df2, col = list(Cell_Line = c("Control" =  "black", "Clone1" = "gold")), annotation_legend_param = list(title_gp = gpar(fontsize = 12, fontface = "bold"), labels_gp = gpar(fontsize = 12)))             

emt_hm1 <- Heatmap(emt_mat1, 
              col= colorRamp2(c(-2,-1,0,1,2),c("blue","skyblue","white","lightcoral","red")),
              heatmap_legend_param=list(at=c(-2,-1,0,1,2),color_bar="continuous", 
                                        legend_direction="vertical", legend_width=unit(15,"cm"),
                                        title_position="topcenter", title_gp=gpar(fontsize=10, fontface="bold")), 
              name = "Z-score",
              
              #Row annotation configurations
              cluster_rows=TRUE,
              show_row_dend=TRUE,
              row_title_side="left",
              row_title_gp=gpar(fontsize=8),
              show_row_names=FALSE,
              row_names_side="right",
              
              #Column annotation configuratiions
              cluster_columns=TRUE,
              show_column_dend=TRUE,
              column_title="Epithelial to Mesenchymal Transition\nClone1 vs Control",
              column_title_side="top",
              column_title_gp=gpar(fontsize=18, fontface="bold"),
              show_column_names = FALSE,
              column_names_gp = gpar(fontsize = 20, fontface="bold"),
              
              #Dendrogram configurations: columns
              clustering_distance_columns="euclidean",
              clustering_method_columns="complete",
              column_dend_height=unit(10,"mm"),

              #Dendrogram configurations: rows
              clustering_distance_rows="euclidean",
              clustering_method_rows="complete",
              row_dend_width=unit(4,"cm"),
              row_dend_side = "left",
              row_dend_reorder = TRUE,

              #Splits
              border=T,
              row_km = 1,
              column_km = 2,
              
              #plot params
              width = unit(5, "inch"), 
              height = unit(4, "inch"),
              #height = unit(0.4, "cm")*nrow(mat),
              
              #Annotations
              top_annotation = ha)


draw(emt_hm1, annotation_legend_side = "right", heatmap_legend_side="right")

Up Regulated

emt_up1 <- subset(up_regulated_table1, up_regulated_table1$Gene %in% rownames(emt_df1))

DT::datatable(emt_up1, rownames = F)

Down Regulated

emt_down1 <- subset(down_regulated_table1, down_regulated_table1$Gene %in% rownames(emt_df1))

DT::datatable(emt_down1, rownames = F)

Clone9 vs Control

Heatmap

emt <- read.table("~/Desktop/GSE118959/data/dbEMT/EMT_symbols.txt", header=T)
## subset log2 for viz
log_counts <- as.data.frame(log_counts)
## what DE genes are involved
emt_merged9 <- rbind(up_regulated_table9, down_regulated_table9)
de_key <- subset(emt, emt$Gene %in% emt_merged9$Gene)
# subset log counts
emt_df9 <- subset(log_counts, rownames(log_counts) %in% de_key$Gene)
# remove clone1
emt_df9 <- emt_df9[,-c(1,2,3)]
## flip it so it reads left to right
emt_df9 <- emt_df9[,c(4,5,6,1,2,3)]
## remove no SD between cases
emt_df9 <- emt_df9[apply(emt_df9, MARGIN = 1, FUN = function(x) sd(x) != 0),]

## calculate zscore for viz
emt_mat9 <- as.matrix(emt_df9)
emt_mat9 <- t(emt_mat9)
emt_mat9 <- scale(emt_mat9, center=T)
emt_mat9 <- t(emt_mat9)
# color scheme
#f1 = colorRamp2(c(-2,-1,0,1,2),c("blue","skyblue","white","lightcoral","red"))
# Top annotation
df2 = data.frame(Cell_Line = c(rep("Control", 3), rep("Clone9", 3)))
ha = HeatmapAnnotation(df = df2, col = list(Cell_Line = c("Control" =  "black", "Clone9" = "gold")), annotation_legend_param = list(title_gp = gpar(fontsize = 12, fontface = "bold"), labels_gp = gpar(fontsize = 12)))             

emt_hm9 <- Heatmap(emt_mat9, 
              col= colorRamp2(c(-2,-1,0,1,2),c("blue","skyblue","white","lightcoral","red")),
              heatmap_legend_param=list(at=c(-2,-1,0,1,2),color_bar="continuous", 
                                        legend_direction="vertical", legend_width=unit(15,"cm"),
                                        title_position="topcenter", title_gp=gpar(fontsize=10, fontface="bold")), 
              name = "Z-score",
              
              #Row annotation configurations
              cluster_rows=TRUE,
              show_row_dend=TRUE,
              row_title_side="left",
              row_title_gp=gpar(fontsize=8),
              show_row_names=FALSE,
              row_names_side="right",
              
              #Column annotation configuratiions
              cluster_columns=TRUE,
              show_column_dend=TRUE,
              column_title="Epithelial to Mesenchymal Transition\nClone9 vs Control",
              column_title_side="top",
              column_title_gp=gpar(fontsize=18, fontface="bold"),
              show_column_names = FALSE,
              column_names_gp = gpar(fontsize = 20, fontface="bold"),
              
              #Dendrogram configurations: columns
              clustering_distance_columns="euclidean",
              clustering_method_columns="complete",
              column_dend_height=unit(10,"mm"),

              #Dendrogram configurations: rows
              clustering_distance_rows="euclidean",
              clustering_method_rows="complete",
              row_dend_width=unit(2,"cm"),
              row_dend_side = "left",
              row_dend_reorder = TRUE,

              #Splits
              border=T,
              row_km = 1,
              column_km = 2,
              
              #plot params
              width = unit(5, "inch"), 
              height = unit(4, "inch"),
              #height = unit(0.4, "cm")*nrow(emt_df9),
              
              #Annotations
              top_annotation = ha)


draw(emt_hm9, annotation_legend_side = "right", heatmap_legend_side="right")

Up Regulated

emt_up9 <- subset(up_regulated_table9, up_regulated_table9$Gene %in% rownames(emt_df9))

DT::datatable(emt_up9, rownames = F)

Down Regulated

emt_down9 <- subset(down_regulated_table9, down_regulated_table9$Gene %in% rownames(emt_df9))

DT::datatable(emt_down9, rownames=F)

Clone1 vs Clone9

Heatmap

emt <- read.table("~/Desktop/GSE118959/data/dbEMT/EMT_symbols.txt", header=T)
## subset log2 for viz
log_counts <- as.data.frame(log_counts)
## what DE genes are involved
emt_merged19 <- rbind(up_regulated_table19, down_regulated_table19)
de_key <- subset(emt, emt$Gene %in% emt_merged19$Gene)
# subset log counts
emt_df19 <- subset(log_counts, rownames(log_counts) %in% de_key$Gene)
# remove control
emt_df19 <- emt_df19[,-c(7,8,9)]
## flip it so it reads left to right
emt_df19 <- emt_df19[,c(4,5,6,1,2,3)]
## remove no SD between cases
emt_df19 <- emt_df19[apply(emt_df19, MARGIN = 1, FUN = function(x) sd(x) != 0),]

## calculate zscore for viz
emt_mat19 <- as.matrix(emt_df19)
emt_mat19 <- t(emt_mat19)
emt_mat19 <- scale(emt_mat19, center=T)
emt_mat19 <- t(emt_mat19)
# color scheme
#f1 = colorRamp2(c(-2,-1,0,1,2),c("blue","skyblue","white","lightcoral","red"))
# Top annotation
df2 = data.frame(Cell_Line = c(rep("Clone9", 3), rep("Clone1", 3)))
ha = HeatmapAnnotation(df = df2, col = list(Cell_Line = c("Clone9" =  "black", "Clone1" = "gold")), annotation_legend_param = list(title_gp = gpar(fontsize = 12, fontface = "bold"), labels_gp = gpar(fontsize = 12)))             

emt_hm19 <- Heatmap(emt_mat19, 
              col= colorRamp2(c(-2,-1,0,1,2),c("blue","skyblue","white","lightcoral","red")),
              heatmap_legend_param=list(at=c(-2,-1,0,1,2),color_bar="continuous", 
                                        legend_direction="vertical", legend_width=unit(15,"cm"),
                                        title_position="topcenter", title_gp=gpar(fontsize=10, fontface="bold")), 
              name = "Z-score",
              
              #Row annotation configurations
              cluster_rows=TRUE,
              show_row_dend=TRUE,
              row_title_side="left",
              row_title_gp=gpar(fontsize=8),
              show_row_names=FALSE,
              row_names_side="right",
              
              #Column annotation configuratiions
              cluster_columns=TRUE,
              show_column_dend=TRUE,
              column_title="Epithelial to Mesenchymal Transition\nClone1 vs Clone9",
              column_title_side="top",
              column_title_gp=gpar(fontsize=18, fontface="bold"),
              show_column_names = FALSE,
              column_names_gp = gpar(fontsize = 20, fontface="bold"),
              
              #Dendrogram configurations: columns
              clustering_distance_columns="euclidean",
              clustering_method_columns="complete",
              column_dend_height=unit(10,"mm"),

              #Dendrogram configurations: rows
              clustering_distance_rows="euclidean",
              clustering_method_rows="complete",
              row_dend_width=unit(4,"cm"),
              row_dend_side = "left",
              row_dend_reorder = TRUE,

              #Splits
              border=T,
              row_km = 1,
              column_km = 2,
              
              #plot params
              width = unit(5, "inch"), 
              height = unit(4, "inch"),
              #height = unit(0.4, "cm")*nrow(mat),
              
              #Annotations
              top_annotation = ha)


draw(emt_hm19, annotation_legend_side = "right", heatmap_legend_side="right")

Up Regulated

emt_up19 <- subset(up_regulated_table19, up_regulated_table19$Gene %in% rownames(emt_df19))

DT::datatable(emt_up19, rownames = F)

Down Regulated

emt_down19 <- subset(down_regulated_table19, down_regulated_table19$Gene %in% rownames(emt_df19))

DT::datatable(emt_down19, rownames = F)

The most recent update of this html document occurred: Thu Mar 25 11:48:00 2021.

For any queries about the analysis, contact Barry Digby at (b.digby237@gmail.com)