These plot types help us understand distributions and group comparisons, rather than relationships over time or between variables.
A histogram shows the distribution of a single numeric variable.
It helps answer questions like:
We will look at miles per gallon (mpg) from
mtcars.
hist(mtcars$mpg)
Common options:
hist(
mtcars$mpg,
col = "lightblue",
main = "Distribution of Miles per Gallon",
xlab = "Miles per Gallon",
breaks = 10, #histogram specific
probability = TRUE #histogram specific (density instead of count)
)
Histogram Practice
Create a histogram of horsepower (hp).
Customize:
hist(mtcars$hp,
col = "purple",
main = "Purple Plot",
xlab = "Horsepower",
breaks = 10,
probability = F
)
What is a Boxplot?
A boxplot summarizes data using: 1. Median 2. Quartiles 3. Range 4. Outliers
Boxplots are useful for:
boxplot(mtcars$mpg)
boxplot(
mtcars$mpg,
main = "Boxplot of Miles per Gallon",
ylab = "Miles per Gallon",
col = "lightgreen"
)
boxplot(
mpg ~ cyl,
data = mtcars,
xlab = "Number of Cylinders",
ylab = "Miles per Gallon",
main = "MPG by Cylinder Count",
col = "orange"
)
mtcars$base10_hp=mtcars$hp/10
boxplot(
mpg ~ cyl,
data = mtcars,
xlab = "Number of Cylinders",
ylab = "Miles per Gallon",
main = "MPG by Cylinder Count",
col = c("orange", "pink", "blue"), # universal
outcol= "red", # boxplot specific
horizontal = TRUE, #boxplot specific ## CHECK YOUR AXES!!!!
names = c("Two", "Four", "Six")
)
#lower level functions
legend("topright", legend = c("4 cyliner", "6 cyliner", "8 cylinder"), fill = c("orange", "pink", "blue"))
grid()
Create a boxplot comparing horsepower (hp) across cylinder groups (cyl).
Customize: - Title - Axis labels - Color
boxplot(hp ~ cyl,
data = mtcars,
main = "Horsepower by Cylinder Group",
xlab = "Number of Cylinders",
ylab = "Horsepower",
col = "lightgreen")
What is a Barplot?
Barplots are used for categorical data or summarized counts.
They show:
# First, count how many cars have each cylinder number.
cyl_counts = table(mtcars$cyl)
cyl_counts
##
## 4 6 8
## 11 7 14
barplot(cyl_counts)
barplot(
cyl_counts,
col = "purple",
main = "Number of Cars by Cylinder Count",
xlab = "Cylinders",
ylab = "Number of Cars"
)
barplot(
cyl_counts,
col = "orange",
main = "Number of Cars by Cylinder Count",
xlab = "Cylinders",
ylab = "Number of Cars",
border = "blue",
lwd= 4,
cex.main = 2, #universal
cex.lab= 1.5, # universal
las= 2, #barplot specific
space= 3 #barplot specific
)
Barplot Practice
Create a barplot showing how many cars fall into each gear category (gear).
Steps: 1. Use table() 2. Use barplot() 3. Add labels and color
barplot(table(mtcars$gear),
main = "Number of Cars by Gear Category",
xlab = "Gear",
ylab = "Number of Cars",
col = "pink")
Summary
You now know how to create:
Together with scatterplots and line plots, these give you a powerful toolkit for visualizing data in R.
In this assignment, you will create and customize three types of plots using base R:
You will use the built-in mtcars dataset unless
otherwise specified.
All plots must include:
A histogram displays the distribution of a single numeric variable.
Create a histogram of one numeric variable from mtcars
(for example: mpg, hp, or
wt).
Your histogram must include:
breakshist(mtcars$mpg,
main = "Distribution of Miles Per Gallon",
xlab = "Miles Per Gallon (mpg)",
col = "coral",
breaks = 10)
Questions:
Is the distribution symmetric, skewed, or approximately normal?
Are there any noticeable gaps or clusters?
A boxplot summarizes a distribution using:
Create a boxplot comparing a numeric variable across groups.
Example: Compare mpg by number of
cylinders (cyl).
Your boxplot must include:
y ~ x)boxplot(mpg ~ cyl,
data = mtcars,
main = "Miles Per Gallon by Cylinder Group",
xlab = "Number of Cylinders",
ylab = "Miles Per Gallon (mpg)",
col = "red")
Questions
Which group has the highest median?
Do any groups appear to have greater variability?
Are there visible outliers?
A barplot displays counts or summarized categorical data.
table() to count frequencies of a categorical
variable (gear or cyl).# Step 1: Create a table
# Example:
# cyl_counts <- table(mtcars$cyl)
# Step 2: Create the barplot
# Example:
# barplot(cyl_counts,
# col = "purple",
# main = "Number of Cars by Cylinder Count",
# xlab = "Cylinders",
# ylab = "Number of Cars")
Questions
Which category has the highest count?
What does this tell you about the dataset?