These plot types help us understand distributions and group comparisons, rather than relationships over time or between variables.
A histogram shows the distribution of a single numeric variable.
It helps answer questions like:
We will look at miles per gallon (mpg) from
mtcars.
hist(mtcars$mpg)
Common options:
hist(
mtcars$mpg,
col = "lightblue",
main = "Distribution of Miles per Gallon",
xlab = "Miles per Gallon",
breaks = 10, #histogram specific
probability = TRUE #histogram specific (density instead of count)
)
Histogram Practice
Create a histogram of horsepower (hp).
Customize:
mtcars
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
#I made it look wonky on purpose, don't hurt me :D
#Adding this many bins is a bad idea; it's hard to estimate what the range of each bin is.
hist(
mtcars$hp,
col="red",
main="Car Horsepower",
xlab="horsepower",
breaks=69,
probability=TRUE
)
What is a Boxplot?
A boxplot summarizes data using: 1. Median 2. Quartiles 3. Range 4. Outliers
Boxplots are useful for:
boxplot(mtcars$mpg)
boxplot(
mtcars$mpg,
main = "Boxplot of Miles per Gallon",
ylab = "Miles per Gallon",
col = "lightgreen"
)
boxplot(
mpg ~ cyl,
data = mtcars,
xlab = "Number of Cylinders",
ylab = "Miles per Gallon",
main = "MPG by Cylinder Count",
col = "orange"
)
boxplot(
mpg ~ cyl,
data = mtcars,
xlab = "Number of Cylinders",
ylab = "Miles per Gallon",
main = "MPG by Cylinder Count",
col = c("orange", "pink", "blue"), # universal
outcol= "red", # boxplot specific
horizontal = TRUE #boxplot specific ## CHECK YOUR AXES!!!!
)
#lower level functions
legend("topright", legend = c("4 cylinder", "6 cyliner", "8 cylinder"), fill = c("orange", "pink", "blue"))
grid()
Create a boxplot comparing horsepower (hp) across cylinder groups (cyl).
Customize: - Title - Axis labels - Color
boxplot(
hp ~ cyl,
data = mtcars,
xlab="Cylinder numbers",
ylab="Horsepower",
main="Horsepower across cylinders",
col=c("red","darkblue","gray60"),
outcol="magenta",
horizontal=TRUE
)
legend("topright", legend=c("4 cylinder", "6 cylinder", "8 cylinder"), fill=c("red","darkblue","gray60"))
grid()
What is a Barplot?
Barplots are used for categorical data or summarized counts.
They show:
# First, count how many cars have each cylinder number.
cyl_counts <- table(mtcars$cyl)
cyl_counts
##
## 4 6 8
## 11 7 14
barplot(cyl_counts)
barplot(
cyl_counts,
col = "purple",
main = "Number of Cars by Cylinder Count",
xlab = "Cylinders",
ylab = "Number of Cars"
)
barplot(
cyl_counts,
col = "orange",
main = "Number of Cars by Cylinder Count",
xlab = "Cylinders",
ylab = "Number of Cars",
border = "blue",
lwd= 2,
cex.main = 2, #universal
cex.lab= 1.5, # universal
las= 2, #barplot specific
space= 0.5 #barplot specific
)
Barplot Practice
Create a barplot showing how many cars fall into each gear category (gear).
Steps: 1. Use table() 2. Use barplot() 3. Add labels and color
gearCounts <- table(mtcars$gear)
barplot(
gearCounts,
col="cyan",
main="Car Gear Count Frequency",
xlab="Gears",
ylab="Number of Cars",
border="magenta",
lwd=6,
cex.main=0.6,
cex.lab=1.5,
las=1,
space=3.3
)
Summary
You now know how to create:
Together with scatterplots and line plots, these give you a powerful toolkit for visualizing data in R.
In this assignment, you will create and customize three types of plots using base R:
You will use the built-in mtcars dataset unless
otherwise specified.
All plots must include:
A histogram displays the distribution of a single numeric variable.
Create a histogram of one numeric variable from mtcars
(for example: mpg, hp, or
wt).
Your histogram must include:
breaks# I'm going to make a histogram of weight.
mtcars
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
hist(
mtcars$wt,
col="darkred",
main="Weight of Cars",
xlab="Car weights (in tons?)",
breaks=5 ,
probability=FALSE
)
grid()
# Is there a way to manually extend the y-axis? One of the bars goes slightly above the y-axis and it bothers me.
Questions:
Is the distribution symmetric, skewed, or approximately normal? The distribution is definitely skewed.
Are there any noticeable gaps or clusters? There are a LOT of 3-4 ton(?) cars compared to the other groups. In comparison, it looks like there’s only one 4-5 ton car.
A boxplot summarizes a distribution using:
Create a boxplot comparing a numeric variable across groups.
Example: Compare mpg by number of
cylinders (cyl).
Your boxplot must include:
y ~ x)# I'm going to compare weight by number of gears.
boxplot(
wt ~ gear ,
data=mtcars,
horizontal=TRUE,
main="Car Weights By Gear",
xlab="Weight (in tons?)",
ylab="Number of gears",
col= c("gold","gray50","#CD7F32"),
outcol="red"
)
grid()
Questions
Which group has the highest median? Three-gear cars.
Do any groups appear to have greater variability? Five-gear cars seem to have the highest variability.
Are there visible outliers? Yes: three-gear cars have four outliers.
A barplot displays counts or summarized categorical data.
table() to count frequencies of a categorical
variable (gear or cyl).# Step 1: Create a table
# Example:
# cyl_counts <- table(mtcars$cyl)
# Step 2: Create the barplot
# Example:
# barplot(cyl_counts,
# col = "purple",
# main = "Number of Cars by Cylinder Count",
# xlab = "Cylinders",
# ylab = "Number of Cars")
carbCounts <- table(mtcars$carb)
barplot(carbCounts,
col="darkred",
main="Number of Cars by Carburetor Count",
xlab="Carburetors",
ylab="Number of Cars",
cex.main=1.5,
cex.lab=0.75,
lwd=1.5,
space=0.3
)
grid()
Questions
Which category has the highest count? Two-carburetor and four-carburetor cars are the most abundant type.
What does this tell you about the dataset? It’s not evenly-distributed, it’s skewed towards certain carburetor counts.