Heatmaps are a way to colorize, visualize, and organize a data set with the goal of finding relationships among observations and features.
I will use heatmaps to find patterns in the gene expression data for the 1K breast cancer patients from The Cancer Genome Atlas. Here, I learned how to create heatmaps with a practice data set.
mtcars
is the dataset that I will be using to practice on today. It was extracted from the 1974 Motor Trend US magazine and comprises fuel consumption and 10 design and performance features for 32 automobiles (1973–74 models).
# Functions in R take arguments within the parentheses
# The function head() returns the first few lines of the mtcars table
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
I can create a table of the entire data set in a new tab with the View()
function.
# View the full data set
View(mtcars)
Each row of mtcars
is an automobile, and each column is a performance feature. For example:
mpg
is miles per gallonwt
is weightThe function help()
provides information on R functions and data. I can find out what all the performance features are:
# what exactly is in mtcars?
help(mtcars)
The function heatmap()
is an easy way to convert the values in mtcars to colors which helps visualize the data and look for relationships.
The help page will provide me with information on heatmap()
.
# The help() function can take a function as its argument
help(heatmap)
In the help file, I learned that heatmap()
plots a numeric matrix of values. So the first step will be to ensure that the data are converted from a table or data frame
to a matrix of number values. I will do most of my analysis on data in matrix form.
The symbol <-
is the assignment operator. It assigns a value on the right side of the operator to a variable on the left side. It functions like an equals (=) sign.
# Convert mtcars into a matrix of numbers
# Assign the output to the variable data
data <- as.matrix(mtcars)
The heatmap()
function is powerful: It not only converts the data values to colors, it also rearranges the rows (automobiles) and columns (performance features) so I can more easily find patterns in the data.
# A heat map is a color image of our data with dendrograms
heatmap(data)
The rows correspond to cars (observations) and the columns to the 10 performance features.
The dendrograms (or tree diagrams) show how close the cars and features are according to the values in our data set.
In the default coloring scheme, the highest values have the darkest colors. I can see that some features disp
and hp
have higher values than others, but otherwise the visualization is not helpful.
Look at the mtcars table. Different features have very different scales, so what is high (red) for one feature, e.g. cyl
, is low for another features, e.g. disp
.
The scale()
function normalizes the features so they are comparable.
# Change the range of each feature so they are comparable
# Assign the output to a new variable data_scaled
data_scaled <- scale(data)
# Looking at the first few rows of data_scaled
head(data_scaled)
## mpg cyl disp hp drat
## Mazda RX4 0.1508848 -0.1049878 -0.57061982 -0.5350928 0.5675137
## Mazda RX4 Wag 0.1508848 -0.1049878 -0.57061982 -0.5350928 0.5675137
## Datsun 710 0.4495434 -1.2248578 -0.99018209 -0.7830405 0.4739996
## Hornet 4 Drive 0.2172534 -0.1049878 0.22009369 -0.5350928 -0.9661175
## Hornet Sportabout -0.2307345 1.0148821 1.04308123 0.4129422 -0.8351978
## Valiant -0.3302874 -0.1049878 -0.04616698 -0.6080186 -1.5646078
## wt qsec vs am gear
## Mazda RX4 -0.610399567 -0.7771651 -0.8680278 1.1899014 0.4235542
## Mazda RX4 Wag -0.349785269 -0.4637808 -0.8680278 1.1899014 0.4235542
## Datsun 710 -0.917004624 0.4260068 1.1160357 1.1899014 0.4235542
## Hornet 4 Drive -0.002299538 0.8904872 1.1160357 -0.8141431 -0.9318192
## Hornet Sportabout 0.227654255 -0.4637808 -0.8680278 -0.8141431 -0.9318192
## Valiant 0.248094592 1.3269868 1.1160357 -0.8141431 -0.9318192
## carb
## Mazda RX4 0.7352031
## Mazda RX4 Wag 0.7352031
## Datsun 710 -1.1221521
## Hornet 4 Drive -1.1221521
## Hornet Sportabout -0.5030337
## Valiant -1.1221521
I think that a heatmap for the scaled data is more informative.
# A heat map is a color image of our data with dendrograms
heatmap(data_scaled)
Now I can see various patterns emerge, like the values and groupings for wt
and mpg
! The clustering of vehicles make sense once put in this format.
I can use a color palette to change the color coding and style in my heatmap.
RColorBrewer
is an R package that contains ready-to-use color palettes for creating nice graphics.
# Packages are loaded with the library() function
library(RColorBrewer)
# Parameters for plotting
par(cex = 0.5)
# Get a graphic for all color schemes
display.brewer.all()
The default color-coding by heatmap is “YlOrRd” which is the top row.
I can use any of the palettes provided. Maybe another scheme reveals relationships in the data more effectively or it’s just more fun.
# Change the arguemnt in parentheses to any of the palettes
heatmap(data_scaled, col=brewer.pal(8,"RdPu"))
Great work! I learned a lot about R:
help()
, head()
, and heatmap
()