The function ‘heatmap()’ is a great way to look at data in a more interesting and fun way.
In this activity we will use the gene expression data from breast cancer patients in The Cancer Genome Atlas (TCGA) and we explore whether clinical features of these patients(mainly age_at_diagnosis) correlate with gene expression patterns.
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.
When you Knit a Rmd(R markdown file) it turn into an HTML, a PDF, or even a Word document. Overall this can make an Rmd with a bunch of code into a neat looking page.
We created an object that holds the name of the directory where the TCGA data resides. This is good R coding practice because we can apply all we do below to a data set in a different directory by changing a single variable (data_dir).
In gene expression analysis, the convention is to use a red, white, blue color scale where + red corresponds to genes that are relatively overexpressed, + blue corresponds to genes that are relatively underexpressed, and + white are genes with average expression levels.
To use a color scheme to reproduce this convention we load a library that contains the palette we want.
library("RColorBrewer")
We load a extra 2 libraries so we can have different types/looking Heatmaps.
We are examining a subset of the TCGA breast cancer expression data. Patient samples are the columns of our gene expression matrix, and the mRNA levels of genes are the rows.
Previously, the gene expression values were log-transformed. Then we chose: + The 500 most variable genes + Every fourth patient sample
We saved this data as a matrix object brca_expr_sub
load(file.path(data_dir, "brca_expr_sub.RData"),verbose=TRUE)
## Loading objects:
## brca_expr_sub
Let’s have a look at this Data:
dim_mat <- dim(brca_expr_sub)
print(paste("The log-transformed gene expression matrix has",dim_mat[1],
"rows and",dim_mat[2],"columns."))
## [1] "The log-transformed gene expression matrix has 500 rows and 271 columns."
brca_expr_sub[1:10,1:10]
## TCGA-3C-AAAU TCGA-4H-AAAK TCGA-A1-A0SD TCGA-A1-A0SH TCGA-A1-A0SM
## CLEC3A -0.4444731 0.52735276 0.16677510 -0.9178450 0.01391312
## CPB1 1.8630076 -0.74264370 2.34079027 -0.5521063 -0.50657065
## SCGB2A2 1.2860995 1.08997172 0.11776738 1.5142496 1.70208586
## SCGB1D2 1.0984689 0.45135225 0.06435826 1.8327443 1.58792690
## TFF1 -0.4898101 0.86572807 0.91149541 0.3064546 1.15536664
## GSTM1 1.2050592 1.32794322 -0.93515772 0.5912250 -0.98400281
## PIP 1.0955774 -0.18334300 0.81964515 -0.1133463 1.16477436
## S100A7 -0.8548111 -0.85481114 0.05561040 -0.7490921 1.23274950
## MUCL1 0.2655204 0.01909743 1.55128156 0.5953695 0.14529979
## ANKRD30A 0.9432439 0.61670044 0.33181421 0.2513654 0.49356835
## TCGA-A1-A0SQ TCGA-A2-A04R TCGA-A2-A04W TCGA-A2-A0CL TCGA-A2-A0CQ
## CLEC3A -0.9178450 -0.9178450 1.49248571 1.41220392 -0.9178450
## CPB1 2.2647191 2.5915996 -0.68515082 -1.05813006 -0.2917495
## SCGB2A2 1.7966174 -0.7901945 1.60427635 -1.03776225 -0.5091813
## SCGB1D2 1.6129304 -0.6078331 0.90008564 -1.34170609 -0.2266103
## TFF1 1.2535206 0.9320101 0.05716392 0.21282911 1.0908511
## GSTM1 1.1410190 0.1197914 -0.91600571 0.92233293 1.6026970
## PIP 1.1542239 -0.6350037 0.33822210 -0.03328558 2.0589881
## S100A7 -0.8548111 -0.8548111 0.43581699 -0.74673270 -0.3336430
## MUCL1 0.4100594 -0.2858546 0.74592674 -0.67289520 -0.4109986
## ANKRD30A 0.6982975 -1.0272372 0.10020537 -1.08966091 1.0892899
A heatmap is a graphical representation of data that uses a system of color-coding to represent different values. Heat maps make it easy to visualize complex data and understand it at a glance.
heatmap(brca_expr_sub,
scale="none", labRow="", labCol="",
col=brewer.pal(10,"RdBu"),
zlim=c(-2, 2), margins = c(1, 0) )
We create a new function object called “my_colors”, so we can be able to give our heatmap new colors.
my_colors <- colorRampPalette(c("cyan","deeppink3"))
Here we have more examples of Heatmaps with different colors and different styles as well.
First up, we have a normal heatmap() but we did some basic color change to the heatmap.
Compared to the original(the heatmap before it) this one is a little unclear due to the color choice.
heatmap(brca_expr_sub, col= my_colors(100))
Finally, we have Heatmaply which is very similar to Plotly. I personally would use Heatmaply instead of Plotly because I’m much more familiar with it, and the colors they provide for the heatmaps are very good. Heatmaply also has the same function like Plotly, when you hover over the heatmap and it’ll show you the gene, and the patient but Heatmaply actually gives the name of the gene and also the patient barcode.
heatmaply(brca_expr_sub, colors= inferno(n=20, alpha = 1, begin = 0, end = 1, direction = 1),
fontsize_row = 1, fontsize_col= 1)