These plot types help us understand distributions and group comparisons, rather than relationships over time or between variables.
A histogram shows the distribution of a single numeric variable.
It helps answer questions like:
We will look at miles per gallon (mpg) from
mtcars.
hist(mtcars$mpg)
Common options:
hist(
mtcars$mpg,
col = "lightblue",
main = "Distribution of Miles per Gallon",
xlab = "Miles per Gallon",
breaks = 10, #histogram specific
probability = TRUE #histogram specific (density instead of count)
)
Histogram Practice
hist(
mtcars$hp,
col = "purple",
main = "Purple Plot",
xlab = "horsepower",
breaks = 10,
probability = TRUE
)
# Your code here
What is a Boxplot?
A boxplot summarizes data using: 1. Median 2. Quartiles 3. Range 4. Outliers
Boxplots are useful for:
boxplot(mtcars$mpg)
boxplot(
mtcars$mpg,
main = "Boxplot of Miles per Gallon",
ylab = "Miles per Gallon",
col = "lightgreen"
)
boxplot(
mpg ~ cyl,
data = mtcars,
xlab = "Number of Cylinders",
ylab = "Miles per Gallon",
main = "MPG by Cylinder Count",
col = "orange"
)
boxplot(
mpg ~ cyl,
data = mtcars,
xlab = "Number of Cylinders",
ylab = "Miles per Gallon",
main = "MPG by Cylinder Count",
col = c("orange", "pink", "blue"), # universal
outcol= "red", # boxplot specific
horizontal = TRUE #boxplot specific ## CHECK YOUR AXES!!!!
)
#lower level functions
legend("topright", legend = c("4 cyliner", "6 cyliner", "8 cylinder"), fill = c("orange", "pink", "blue"))
grid()
Create a boxplot comparing horsepower (hp) across cylinder groups (cyl).
Customize: - Title - Axis labels - Color
boxplot(
mpg ~ cyl,
data = mtcars,
xlab = "Number of Cylinders",
ylab = "Miles per Gallon",
main = "MPG by Cylinder Count",
col = c("purple", "green", "hotpink"),
outcol= "blue",
horizontal = TRUE
)
legend("topright", legend = c("4 cyliner", "6 cyliner", "8 cylinder"), fill = c("purple", "green", "hotpink"))
grid()
What is a Barplot?
Barplots are used for categorical data or summarized counts.
They show:
# First, count how many cars have each cylinder number.
cyl_counts <- table(mtcars$cyl)
cyl_counts
##
## 4 6 8
## 11 7 14
barplot(cyl_counts)
barplot(
cyl_counts,
col = "purple",
main = "Number of Cars by Cylinder Count",
xlab = "Cylinders",
ylab = "Number of Cars"
)
barplot(
cyl_counts,
col = "orange",
main = "Number of Cars by Cylinder Count",
xlab = "Cylinders",
ylab = "Number of Cars",
border = "blue",
lwd= 2,
cex.main = 2, #universal
cex.lab= 1.5, # universal
las= 2, #barplot specific
space= 0.5 #barplot specific
)
Barplot Practice
Create a barplot showing how many cars fall into each gear category (gear).
Steps: 1. Use table() 2. Use barplot() 3. Add labels and color
barplot(
cyl_counts,
col = "blue",
main = "Number of Cars by Gear",
xlab = "Number of Gears",
ylab = "Count of Cars",
border = "pink",
lwd= 2,
cex.main = 2,
cex.lab= 1.5,
las= 2,
space= 0.5
)
Summary
You now know how to create:
Together with scatterplots and line plots, these give you a powerful toolkit for visualizing data in R.
In this assignment, you will create and customize three types of plots using base R:
You will use the built-in mtcars dataset unless
otherwise specified.
All plots must include:
A histogram displays the distribution of a single numeric variable.
Create a histogram of one numeric variable from mtcars
(for example: mpg, hp, or
wt).
Your histogram must include:
breakshead(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
hist(mtcars$hp)
hist(
mtcars$hp,
col = "hotpink",
main = "Pink plot",
xlab = "horsepower",
breaks = 8,
)
Questions:
Is the distribution symmetric, skewed, or approximately normal? It is approximately normal.
Are there any noticeable gaps or clusters? There are no noticable gaps. —
A boxplot summarizes a distribution using:
Create a boxplot comparing a numeric variable across groups.
Example: Compare mpg by number of
cylinders (cyl).
Your boxplot must include:
y ~ x)boxplot(mtcars$hp)
boxplot(
hp ~ am,
data = mtcars,
xlab = "Transmission type",
ylab = "amount of horsepower",
main = "Amount of horsepower compared to transmission type",
col = "green",
outcol = "blue"
)
Questions
Which group has the highest median? Group 1 has the highest median.
Do any groups appear to have greater variability? It appears that group 0 has a greater variability.
Are there visible outliers? Yes, there is visible outliers.
A barplot displays counts or summarized categorical data.
table() to count frequencies of a categorical
variable (gear or cyl).# Step 1: Create a table
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
# Example:
# cyl_counts <- table(mtcars$cyl)
am_counts <- table(mtcars$am)
# Step 2: Create the barplot
# Example:
barplot(am_counts,
col= "pink",
main = "Transmission type related to number of cars",
xlab = "tranmission type",
ylab = "number of cars")
#barplot( cyl_counts)
#col = "purple",
# main = "number of cars by cylinder type",
# xlab = "cylinders",
# ylab = "Number of Cars")
Questions
Which category has the highest count? Group 0 has the highest count of cars
What does this tell you about the dataset? This means that this group has the most cars using certain types of transmissions. There must be a popular transmission type in this group.