Exploratory Data Analysis for mtcars

mtcars data

R provides many data sets to work with, so we can learn new analysis skills before scaling up. mtcars is a classic go-to R data frame. It was extracted from the 1974 Motor Trend US magazine and comprises fuel consumption and 10 design and performance features for 32 automobiles (1973–74 models).

We can create a table of the entire data set in a new tab with the View() function.

# Check out the full data set
View(mtcars)

Each row of mtcars is an automobile, and each column is a performance feature. For example:

mpg is miles per gallon
wt is weight

The function help() provides information on R functions and data. We can find out what all the performance features are:

# what exactly is in mtcars?
help(mtcars)

Preparing the mtcars data

We learned that heatmap() plots a numeric matrix of values. So our first step will be to ensure that the data are converted from a table or data frame to a matrix of number values. We will do most of our analysis on data in matrix form.

The symbol <- is the assignment operator. It assigns a value on the right side of the operator to a variable on the left side. It functions, for us, like an equals (=) sign.

# Convert mtcars into a matrix of numbers
# Assign the output to the variable data
data <- as.matrix(mtcars)

Heatmap for scaled mtcars

Heatmaps are a way to colorize, visualize, and organize a data set with the goal of finding relationships among observations and features.

The scale() function normalizes the features so they are comparable.

# Let's change the range of each feature so they are comparable
# We'll assigne the output to a new variable data_scaled
data_scaled <- scale(data)

We found that the heatmap for the scaled data reveals patterns in the data.

# A heat map is a color image of our data with dendrograms
heatmap(data_scaled)

Scatterplots

Scatterplots plot one variable against another. They work best for continuous data.

mgp
disp
hp
drat
wt
qsec

# Make a data matrix for continuous variable
sub_data_scaled <- data_scaled[,c(1,3:7)]

# Do an all-on-all scatter plot
pairs(sub_data_scaled)

We can see in greater detail which features have relationships with others. Scatterplots help us find correlations.

Boxplots

Boxplots are a simple way to see the distribution of features.

# boxplots show the range of each feature
boxplot(x = as.list(mtcars))

# boxplots show the range of each feature
# Let's remove mpg, disp, hp, and qsec 
boxplot(x = as.list(mtcars[,-c(1,3,4,7)]))

We can create boxplots for the scaled features as well to ensure our scaled data have similar distributions.

# boxplots show the range of each feature
boxplot(x = as.list(as.data.frame(data_scaled)))

We will learn about other visualizations when we do analyses with our data.

Exploratory Data Analysis for mtcars

Zion Thomas

7/14/2025

First steps

mtcars data

Preparing the mtcars data

Heatmap for scaled mtcars

Scatterplots

Boxplots