Heatmaps

Heatmaps are a way to colorize, visualize, and organize a data set with the goal of finding relationships among observations and features.

I will use heatmaps to find patterns in the gene expression data for the 1K breast cancer patients from The Cancer Genome Atlas. Here, I learned how to create heatmaps with a practice data set.

mtcars data

mtcars is the dataset that I will be using to practice on today. It was extracted from the 1974 Motor Trend US magazine and comprises fuel consumption and 10 design and performance features for 32 automobiles (1973–74 models).

# Functions in R take arguments within the parentheses

# The function head() returns the first few lines of the mtcars table
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1


I can create a table of the entire data set in a new tab with the View() function.

# View the full data set
View(mtcars)


Each row of mtcars is an automobile, and each column is a performance feature. For example:

  • mpg is miles per gallon
  • wt is weight

The function help() provides information on R functions and data. I can find out what all the performance features are:

# what exactly is in mtcars?
help(mtcars)

Preparing the mtcars data

The function heatmap() is an easy way to convert the values in mtcars to colors which helps visualize the data and look for relationships.

The help page will provide me with information on heatmap().

# The help() function can take a function as its argument
help(heatmap)


In the help file, I learned that heatmap() plots a numeric matrix of values. So the first step will be to ensure that the data are converted from a table or data frame to a matrix of number values. I will do most of my analysis on data in matrix form.

The symbol <- is the assignment operator. It assigns a value on the right side of the operator to a variable on the left side. It functions like an equals (=) sign.

# Convert mtcars into a matrix of numbers
# Assign the output to the variable data
data <- as.matrix(mtcars)   

Heatmap for mtcars

The heatmap() function is powerful: It not only converts the data values to colors, it also rearranges the rows (automobiles) and columns (performance features) so I can more easily find patterns in the data.

# A heat map is a color image of our data with dendrograms
heatmap(data)  


The rows correspond to cars (observations) and the columns to the 10 performance features.

The dendrograms (or tree diagrams) show how close the cars and features are according to the values in our data set.

In the default coloring scheme, the highest values have the darkest colors. I can see that some features disp and hp have higher values than others, but otherwise the visualization is not helpful.

Look at the mtcars table. Different features have very different scales, so what is high (red) for one feature, e.g. cyl, is low for another features, e.g. disp.


Heatmap for scaled mtcars

The scale() function normalizes the features so they are comparable.

# Change the range of each feature so they are comparable
# Assign the output to a new variable data_scaled
data_scaled <- scale(data)


# Looking at the first few rows of data_scaled

head(data_scaled)
##                          mpg        cyl        disp         hp       drat
## Mazda RX4          0.1508848 -0.1049878 -0.57061982 -0.5350928  0.5675137
## Mazda RX4 Wag      0.1508848 -0.1049878 -0.57061982 -0.5350928  0.5675137
## Datsun 710         0.4495434 -1.2248578 -0.99018209 -0.7830405  0.4739996
## Hornet 4 Drive     0.2172534 -0.1049878  0.22009369 -0.5350928 -0.9661175
## Hornet Sportabout -0.2307345  1.0148821  1.04308123  0.4129422 -0.8351978
## Valiant           -0.3302874 -0.1049878 -0.04616698 -0.6080186 -1.5646078
##                             wt       qsec         vs         am       gear
## Mazda RX4         -0.610399567 -0.7771651 -0.8680278  1.1899014  0.4235542
## Mazda RX4 Wag     -0.349785269 -0.4637808 -0.8680278  1.1899014  0.4235542
## Datsun 710        -0.917004624  0.4260068  1.1160357  1.1899014  0.4235542
## Hornet 4 Drive    -0.002299538  0.8904872  1.1160357 -0.8141431 -0.9318192
## Hornet Sportabout  0.227654255 -0.4637808 -0.8680278 -0.8141431 -0.9318192
## Valiant            0.248094592  1.3269868  1.1160357 -0.8141431 -0.9318192
##                         carb
## Mazda RX4          0.7352031
## Mazda RX4 Wag      0.7352031
## Datsun 710        -1.1221521
## Hornet 4 Drive    -1.1221521
## Hornet Sportabout -0.5030337
## Valiant           -1.1221521

I think that a heatmap for the scaled data is more informative.

# A heat map is a color image of our data with dendrograms
heatmap(data_scaled)

Now I can see various patterns emerge, like the values and groupings for wt and mpg! The clustering of vehicles make sense once put in this format.


Color schemes

I can use a color palette to change the color coding and style in my heatmap.

RColorBrewer is an R package that contains ready-to-use color palettes for creating nice graphics.

# Packages are loaded with the library() function
library(RColorBrewer)

# Parameters for plotting 
par(cex = 0.5)

# Get a graphic for all color schemes
display.brewer.all()

The default color-coding by heatmap is “YlOrRd” which is the top row.

I can use any of the palettes provided. Maybe another scheme reveals relationships in the data more effectively or it’s just more fun.

# Change the arguemnt in parentheses to any of the palettes
heatmap(data_scaled, col=brewer.pal(8,"RdPu"))


Summary

Great work! I learned a lot about R: