These plot types help us understand distributions and group comparisons, rather than relationships over time or between variables.
A histogram shows the distribution of a single numeric variable. There is no Y value
It helps answer questions like:
We will look at miles per gallon (mpg) from
mtcars.
hist(mtcars$mpg)
Common options:
hist(
mtcars$mpg,
col = "lightblue",
main = "Distribution of Miles per Gallon",
xlab = "Miles per Gallon",
breaks = 10, #histogram specific
probability = TRUE #histogram specific (density instead of count)
)
Histogram Practice
Create a histogram of horsepower (hp).
Customize:
hist(mtcars$hp,
col = "lightpink",
main = "Distribution of Horsepower",
xlab = "Horsepower",
)
hist(mtcars$hp,
col = "lightpink",
main = "Distribution of Horsepower",
xlab = "Horsepower",
probability = TRUE
)
What is a Boxplot?
A boxplot summarizes data using: 1. Median 2. Quartiles 3. Range 4. Outliers
Boxplots are useful for:
boxplot(mtcars$mpg)
boxplot(
mtcars$mpg,
main = "Boxplot of Miles per Gallon",
ylab = "Miles per Gallon",
col = "lightgreen"
)
boxplot(
mpg ~ cyl,
data = mtcars,
xlab = "Number of Cylinders",
ylab = "Miles per Gallon",
main = "MPG by Cylinder Count",
col = "orange"
)
boxplot(
mpg ~ cyl,
data = mtcars,
xlab = "Number of Cylinders",
ylab = "Miles per Gallon",
main = "MPG by Cylinder Count",
col = c("orange", "pink", "blue"), # universal
outcol= "red", # boxplot specific
horizontal = TRUE #boxplot specific ## CHECK YOUR AXES!!!!
)
#lower level functions
legend("topright", legend = c("4 cyliner", "6 cyliner", "8 cylinder"), fill = c("orange", "pink", "blue"))
grid()
Create a boxplot comparing horsepower (hp) across cylinder groups (cyl).
Customize: - Title - Axis labels - Color
boxplot(
hp ~ cyl,
data = mtcars,
xlab = "Number of Cylinders",
ylab = "Horsepower",
main = "HP by Cylinder Count",
col = "orange"
)
boxplot(
hp ~ cyl,
data = mtcars,
xlab = "Number of Cylinders",
ylab = "Horsepower",
main = "HP by Cylinder Count",
col = c("orange", "pink", "blue"), # universal
outcol= "red", # boxplot specific
horizontal = TRUE #boxplot specific ## CHECK YOUR AXES!!!!
)
What is a Barplot?
Barplots are used for categorical data or summarized counts.
They show:
# First, count how many cars have each cylinder number.
cyl_counts <- table(mtcars$cyl)
cyl_counts
##
## 4 6 8
## 11 7 14
barplot(cyl_counts)
barplot(
cyl_counts,
col = "purple",
main = "Number of Cars by Cylinder Count",
xlab = "Cylinders",
ylab = "Number of Cars"
)
barplot(
cyl_counts,
col = "orange",
main = "Number of Cars by Cylinder Count",
xlab = "Cylinders",
ylab = "Number of Cars",
border = "blue",
lwd= 2,
cex.main = 2, #universal
cex.lab= 1.5, # universal
las= 2, #barplot specific, telling the numbers orientation
space= 0.5 #barplot specific, space between the bars
)
Barplot Practice
Create a barplot showing how many cars fall into each gear category (gear).
Steps: 1. Use table() 2. Use barplot() 3. Add labels and color
gear_counts <- table(mtcars$gear)
barplot(
gear_counts,
col = "orange",
main = "Number of Cars by Gear Category",
xlab = "Gears",
ylab = "Number of Cars",
border = "blue",
lwd= 2,
cex.main = 2, #universal
cex.lab= 1.5, # universal
las= 2, #barplot specific, telling the numbers orientation
space= 0.5 #barplot specific, space between the bars
)
Summary
You now know how to create:
Together with scatterplots and line plots, these give you a powerful toolkit for visualizing data in R.
In this assignment, you will create and customize three types of plots using base R:
You will use the built-in mtcars dataset unless
otherwise specified.
All plots must include:
A histogram displays the distribution of a single numeric variable.
Create a histogram of one numeric variable from mtcars
(for example: mpg, hp, or
wt).
Your histogram must include:
breakshead(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
hist(mtcars$hp)
hist(
mtcars$mpg,
col = "green",
main = "Distribution of Horsepower",
xlab = "Horsepower",
breaks = 5,
)
Questions:
1. Is the distribution symmetric, skewed, or approximately normal?
The distribution is skewed towards the left
2. Are there any noticeable gaps or clusters?
There is a gap between 25 and 30
A boxplot summarizes a distribution using:
Create a boxplot comparing a numeric variable across groups.
Example: Compare mpg by number of
cylinders (cyl).
Your boxplot must include:
y ~ x)boxplot(mtcars$hp)
boxplot(
hp ~ am,
data = mtcars,
xlab = "Transmission Type",
ylab = "Amount of Horsepower",
main = "Horsepower Compared to Transmission Type",
col = "pink",
outcol= "red"
)
Questions
1. Which group has the highest median?
Transmission type 0 which was the manual transmission has the highest amount of horsepower
2. Do any groups appear to have greater variability?
The transmission type automatic has higher variability since they have numbers that range from 50 to ~ 35o when compared to the manual transmission horsepower range of ~60 to ~250.
3. Are there visible outliers?
Yes, there are two for the transmission type 1.
A barplot displays counts or summarized categorical data.
table() to count frequencies of a categorical
variable (gear or cyl).# Step 1: Create a table
am_counts<- table(mtcars$am)
# Step 2: Create the barplot
barplot(am_counts,
col = "lightblue",
main = "Number of Cars by Transmission Type",
xlab = "Transmission Type",
ylab = "Number of Cars")
# Example:
# barplot(cyl_counts,
# col = "purple",
# main = "Number of Cars by Cylinder Count",
# xlab = "Cylinders",
# ylab = "Number of Cars")
Questions
1. Which category has the highest count?
Transmission type 0 has the highest count
2. What does this tell you about the dataset?
This tells me that there are more cars with transmission type 0 (manual) than there are transmission type 1 (automatic)