R Markdown Notebook

R is a free, open-source programming language widely used in academia and industry for data analysis, statistical modeling, and data visualization.

We are working with an R Markdown file Cereals_Heatmap.Rmd. With R Markdown, we can generate reproducible workflows and create reports we can share as HTML pages, PDF files or Word documents.

We can add a new code chunk by clicking the Insert button.


Preliminaries

Save this file with another name to preserve the original. Add your name as author in the **yaml* code above.


Heatmap practice

Heatmaps are a way to colorize, visualize, and organize a data set with the goal of finding relationships among observations and features.

Breakfast cereal

Let’s address a question of great importance to all breakfast eaters: Should I stick with Lucky Charms or switch to Wheaties?

We have a file with nutrition data on 80 cereal products. This data set is not part of R, and we will read the file in from our directory.

# Read the file in your directory and assign the output to `cereal_data`
# csv stands for "comma separated values"
# csv format is common in data analysis

cereal <- read.csv("cereal.csv")


Let’s look at what we have:

# The View() function lets us see the entire table
View(cereal)


The rows are cereals and the columns are nutrition features. When we worked with the mtcars data, the rows were cars and the columns were performance features.

The columns of the cereal dataset are

  • name: Name of cereal
  • calories: calories per serving
  • protein: grams of protein
  • fat: grams of fat
  • sodium: milligrams of sodium
  • fiber: grams of dietary fiber
  • carbo: grams of complex carbohydrates
  • sugars: grams of sugars
  • potass: milligrams of potassium
  • vitamins: vitamins and minerals - 0, 25, or 100
  • shelf: display shelf (1, 2, or 3, counting from the floor)
  • weight: weight in ounces of one serving
  • cups: number of cups in one serving
  • rating: a rating of the cereals, unknown source

Data manipulation

We have to do some manipulation of the data before we can make a heatmap. We learned that heatmap() takes a matrix of number values as input. You can remind yourself of this by using the help() function!

The first column of cereal is names. We have to remove this column because it is text or character data not numeric. But first, let’s set the cereal names as the row names for our table. The columns are already named for us. Then we can remove the names column and ensure the data is a matrix of number values.

# Create an object of names from the first column 
rnames <- cereal[,1]

# Then remove the first column
cereal_noCol1 <- cereal[,-1]

# Set `rnames` (the cereals) as row names of your data
row.names(cereal_noCol1) <- rnames

# Make sure the data is in matrix form
# cereal_data is the object we will do our analysis with
cereal_data <- as.matrix(cereal_noCol1)


Check the matrix before continuing:

# The View() function lets us see the entire table
View(cereal_data)


Let’s see if we can find some relationships with the heatmap() function as we previously did with mtcars.

# the heatmap function works with a single argument- our data
heatmap(cereal_data)


As we saw with the mtcars analysis, the different features have different ranges. When their values are converted to colors, we don’t capture the true differences. We will use the same strategy and scale the data so the ranges are similar.

# Scale the data
cereal_scaled <- scale(cereal_data)

# What does the heatmap look like with scaling?
heatmap(cereal_scaled)


Are there patterns in the data? What kinds of cereals are grouped together?

Back to our original question. Should we switch from Lucky Charms to Wheaties? I guess it depends on whether we want to be “healthy” or enjoy our sugar fix!

Can you make a guess as to what features the cereal ratings may have stressed?


Color schemes

We can use a color palette to change the color coding and style of our heatmap.

RColorBrewer is an R package that contains ready-to-use color palettes for creating nice graphics.

# Packages are loaded with the library() function
library(RColorBrewer)

# Parameters for plotting 
par(cex = 0.5)

# Get a graphic for all color schemes
display.brewer.all()

The default color-coding by heatmap is “YlOrRd” which is the top row.

We can use any of the palettes provided. Perhaps another scheme reveals relationships in the data more effectively or it is just more fun.

# Re-create the heatmap with a different color palette
heatmap(cereal_scaled, col=brewer.pal(8,"Greens"))


Summary

Congratulations! You completed your first solo R activity!