R is a free, open-source programming language widely used in academia and industry for data analysis, statistical modeling, and data visualization.
We are working with an R Markdown file Cereals_Heatmap.Rmd
. With R Markdown, we can generate reproducible workflows and create reports we can share as HTML pages, PDF files or Word documents.
We can add a new code chunk by clicking the Insert button.
Save this file with another name to preserve the original. Add your name as author in the **yaml* code above.
Heatmaps are a way to colorize, visualize, and organize a data set with the goal of finding relationships among observations and features.
Let’s address a question of great importance to all breakfast eaters: Should I stick with Lucky Charms or switch to Wheaties?
We have a file with nutrition data on 80 cereal products. This data set is not part of R, and we will read the file in from our directory.
# Read the file in your directory and assign the output to `cereal_data`
# csv stands for "comma separated values"
# csv format is common in data analysis
cereal <- read.csv("cereal.csv")
Let’s look at what we have:
# The View() function lets us see the entire table
View(cereal)
The rows are cereals and the columns are nutrition features. When we worked with the mtcars
data, the rows were cars and the columns were performance features.
The columns of the cereal dataset are
We have to do some manipulation of the data before we can make a heatmap. We learned that heatmap()
takes a matrix of number values as input. You can remind yourself of this by using the help()
function!
The first column of cereal
is names
. We have to remove this column because it is text or character data not numeric. But first, let’s set the cereal names as the row names for our table. The columns are already named for us. Then we can remove the names
column and ensure the data is a matrix of number values.
# Create an object of names from the first column
rnames <- cereal[,1]
# Then remove the first column
cereal_noCol1 <- cereal[,-1]
# Set `rnames` (the cereals) as row names of your data
row.names(cereal_noCol1) <- rnames
# Make sure the data is in matrix form
# cereal_data is the object we will do our analysis with
cereal_data <- as.matrix(cereal_noCol1)
Check the matrix before continuing:
# The View() function lets us see the entire table
View(cereal_data)
Let’s see if we can find some relationships with the heatmap()
function as we previously did with mtcars
.
# the heatmap function works with a single argument- our data
heatmap(cereal_data)
As we saw with the mtcars
analysis, the different features have different ranges. When their values are converted to colors, we don’t capture the true differences. We will use the same strategy and scale the data so the ranges are similar.
# Scale the data
cereal_scaled <- scale(cereal_data)
# What does the heatmap look like with scaling?
heatmap(cereal_scaled)
Are there patterns in the data? What kinds of cereals are grouped together?
Back to our original question. Should we switch from Lucky Charms to Wheaties? I guess it depends on whether we want to be “healthy” or enjoy our sugar fix!
Can you make a guess as to what features the cereal ratings may have stressed?
We can use a color palette to change the color coding and style of our heatmap.
RColorBrewer
is an R package that contains ready-to-use color palettes for creating nice graphics.
# Packages are loaded with the library() function
library(RColorBrewer)
# Parameters for plotting
par(cex = 0.5)
# Get a graphic for all color schemes
display.brewer.all()
The default color-coding by heatmap is “YlOrRd” which is the top row.
We can use any of the palettes provided. Perhaps another scheme reveals relationships in the data more effectively or it is just more fun.
# Re-create the heatmap with a different color palette
heatmap(cereal_scaled, col=brewer.pal(8,"Greens"))
Congratulations! You completed your first solo R activity!