First steps

Under the File tab, use Save As… to make a version of this file with a new name. In case things go sideways, we can go back to the original.

At the top of this document, put your name between the quotes after author. This is now your notebook.


mtcars data

R provides many data sets to work with, so we can learn new analysis skills before scaling up. mtcars is a classic go-to R data frame. It was extracted from the 1974 Motor Trend US magazine and comprises fuel consumption and 10 design and performance features for 32 automobiles (1973–74 models).

We can create a table of the entire data set in a new tab with the View() function.

# Check out the full data set
View(mtcars)


Each row of mtcars is an automobile, and each column is a performance feature. For example:

The function help() provides information on R functions and data. We can find out what all the performance features are:

# what exactly is in mtcars?
help(mtcars)

Preparing the mtcars data

We learned that heatmap() plots a numeric matrix of values. So our first step will be to ensure that the data are converted from a table or data frame to a matrix of number values. We will do most of our analysis on data in matrix form.

The symbol <- is the assignment operator. It assigns a value on the right side of the operator to a variable on the left side. It functions, for us, like an equals (=) sign.

# Convert mtcars into a matrix of numbers
# Assign the output to the variable data
data <- as.matrix(mtcars)   

Heatmap for scaled mtcars

Heatmaps are a way to colorize, visualize, and organize a data set with the goal of finding relationships among observations and features.

The scale() function normalizes the features so they are comparable.

# Let's change the range of each feature so they are comparable
# We'll assigne the output to a new variable data_scaled
data_scaled <- scale(data)

We found that the heatmap for the scaled data reveals patterns in the data.

# A heat map is a color image of our data with dendrograms
heatmap(data_scaled)

Scatterplots

Scatterplots plot one variable against another. They work best for continuous data.

  • mgp
  • disp
  • hp
  • drat
  • wt
  • qsec
# Make a data matrix for continuous variable
sub_data_scaled <- data_scaled[,c(1,3:7)]

# Do an all-on-all scatter plot
pairs(sub_data_scaled)

We can see in greater detail which features have relationships with others. Scatterplots help us find correlations.

Boxplots

Boxplots are a simple way to see the distribution of features.

# boxplots show the range of each feature
boxplot(x = as.list(mtcars))

# boxplots show the range of each feature
# Let's remove mpg, disp, hp, and qsec 
boxplot(x = as.list(mtcars[,-c(1,3,4,7)]))

We can create boxplots for the scaled features as well to ensure our scaled data have similar distributions.

# boxplots show the range of each feature
boxplot(x = as.list(as.data.frame(data_scaled)))



We will learn about other visualizations when we do analyses with our data.