Boxplots, Histograms, and Barplots in Base R

These plot types help us understand distributions and group comparisons, rather than relationships over time or between variables.


Histograms

What is a Histogram?

A histogram shows the distribution of a single numeric variable.

It helps answer questions like:

  • How are values spread out?
  • Is the data skewed?
  • Are there multiple peaks?

Example Histogram

We will look at miles per gallon (mpg) from mtcars.

hist(mtcars$mpg)

Common options:

  1. col – bar color
  2. main – title
  3. xlab – x-axis label
  4. breaks – number of bins
hist(
  mtcars$mpg,
  col = "lightblue",
  main = "Distribution of Miles per Gallon",
  xlab = "Miles per Gallon",
  breaks = 10, #histogram specific 
  probability = TRUE #histogram specific (density instead of count)
)

Histogram Practice

  1. Create a histogram of horsepower (hp).

  2. Customize:

  • Color
  • Title
  • X-axis label
hist(mtcars$hp, 
     col = "purple", 
     main = "Purple Plot",
     xlab = "Horsepower", 
     breaks = 10, 
     probability = F
     )

Boxplots

What is a Boxplot?

A boxplot summarizes data using: 1. Median 2. Quartiles 3. Range 4. Outliers

Boxplots are useful for:

  1. Comparing groups
  2. Identifying outliers
  3. Viewing spread quickly
boxplot(mtcars$mpg)

Boxplots with Labels

boxplot(
  mtcars$mpg,
  main = "Boxplot of Miles per Gallon",
  ylab = "Miles per Gallon",
  col = "lightgreen"
)

Comparing Groups with Boxplots

boxplot(
  mpg ~ cyl,
  data = mtcars,
  xlab = "Number of Cylinders",
  ylab = "Miles per Gallon",
  main = "MPG by Cylinder Count",
  col = "orange"
)

mtcars$base10_hp=mtcars$hp/10


boxplot(
  mpg ~ cyl,
  data = mtcars,
  xlab = "Number of Cylinders",
  ylab = "Miles per Gallon",
  main = "MPG by Cylinder Count",
  col = c("orange", "pink", "blue"), # universal
  outcol= "red", # boxplot specific
  horizontal = TRUE, #boxplot specific ## CHECK YOUR AXES!!!!
  names = c("Two", "Four", "Six")
  ) 
#lower level functions
  legend("topright", legend = c("4 cyliner", "6 cyliner", "8 cylinder"), fill = c("orange", "pink", "blue"))

grid()

Boxplot Practice

Create a boxplot comparing horsepower (hp) across cylinder groups (cyl).

Customize: - Title - Axis labels - Color

boxplot(
  hp ~ cyl,
  data = mtcars,
  xlab = "Number of Cylinders",
  ylab = "Horsepower",
  main = "Horsepower by Cylinder Count",
  col = "lightblue"
)

Barplots

What is a Barplot?

Barplots are used for categorical data or summarized counts.

They show:

  1. Frequencies
  2. Totals
  3. Group comparisons
# First, count how many cars have each cylinder number.

cyl_counts = table(mtcars$cyl)
cyl_counts
## 
##  4  6  8 
## 11  7 14
barplot(cyl_counts)

Customizing a Barplot

barplot(
  cyl_counts,
  col = "purple",
  main = "Number of Cars by Cylinder Count",
  xlab = "Cylinders",
  ylab = "Number of Cars"
)

barplot(
  cyl_counts,
  col = "orange",
  main = "Number of Cars by Cylinder Count",
  xlab = "Cylinders",
  ylab = "Number of Cars",
  border = "blue",
  lwd= 4,
  cex.main = 2, #universal
  cex.lab= 1.5, # universal
  las= 2, #barplot specific
  space= 3 #barplot specific
  )

Barplot Practice

Barplot Practice

Create a barplot showing how many cars fall into each gear category (gear).

Steps: 1. Use table() 2. Use barplot() 3. Add labels and color

# Your code here
gear_counts = table(mtcars$gear)
gear_counts
## 
##  3  4  5 
## 15 12  5
barplot(
  gear_counts,
  col = "navy",
  main = "Number of Cars by Gear Category",
  xlab = "Gear Category",
  ylab = "Number of Cars"
)

Summary

You now know how to create:

  1. Histograms → distributions of numeric data
  2. Boxplots → summaries and group comparisons
  3. Barplots → categorical counts

Together with scatterplots and line plots, these give you a powerful toolkit for visualizing data in R.

Homework

In this assignment, you will create and customize three types of plots using base R:

  1. Histogram
  2. Boxplot
  3. Barplot

You will use the built-in mtcars dataset unless otherwise specified.

All plots must include:

Part 1: Histogram (Distribution of a Numeric Variable)

A histogram displays the distribution of a single numeric variable.

Create a histogram of one numeric variable from mtcars (for example: mpg, hp, or wt).

Your histogram must include:

  • Custom color
  • Title
  • X-axis label
  • Custom number of bins using breaks
# Your histogram code here
hist(
  mtcars$hp,
  col = "maroon",
  main = "Distribution of Horsepower",
  xlab = "Horsepower",
  breaks = 10, 
  probability = TRUE 
)

Questions:

  1. Is the distribution symmetric, skewed, or approximately normal?

Skewed right

  1. Are there any noticeable gaps or clusters?

Cluster in the lower horsepower range


Part 2: Boxplot (Comparing Groups)

A boxplot summarizes a distribution using:

  • Median
  • Quartiles
  • Potential outliers

Create a boxplot comparing a numeric variable across groups.

Example: Compare mpg by number of cylinders (cyl).

Your boxplot must include:

  • Group comparison formula format (y ~ x)
  • Title
  • Axis labels
  • Custom color
# Your boxplot code here
boxplot(
  hp ~ gear,
  data = mtcars,
  xlab = "Gear Category",
  ylab = "Horsepower",
  main = "Horsepower based on Gear Type",
  col = "darkgreen"
)

Questions

  1. Which group has the highest median?

3

  1. Do any groups appear to have greater variability?

5

  1. Are there visible outliers?

No


Part 3: Barplot (Categorical Counts)

A barplot displays counts or summarized categorical data.

  1. Use table() to count frequencies of a categorical variable (gear or cyl).
  2. Create a barplot of those counts.
  3. Customize the plot with:
  • Color
  • Title
  • Axis labels
# Step 1: Create a table

# Example:
# cyl_counts <- table(mtcars$cyl)
am_counts <- table(mtcars$am)
# Step 2: Create the barplot
barplot(
  am_counts,
  col = "maroon",
  main = "Number of Cars by Transmission",
  xlab="Transmission",
  ylab = "Number of Cars"
)

# Example:
# barplot(cyl_counts,
#         col = "purple",
#         main = "Number of Cars by Cylinder Count",
#         xlab = "Cylinders",
#         ylab = "Number of Cars")

Questions

  1. Which category has the highest count?

Automatic Transmission

  1. What does this tell you about the dataset?

There are more automatic cars represented than manual cars