Boxplots, Histograms, and Barplots in Base R

These plot types help us understand distributions and group comparisons, rather than relationships over time or between variables.


Histograms

What is a Histogram?

A histogram shows the distribution of a single numeric variable.

It helps answer questions like:

  • How are values spread out?
  • Is the data skewed?
  • Are there multiple peaks?

Example Histogram

We will look at miles per gallon (mpg) from mtcars.

hist(mtcars$mpg)

Common options:

  1. col – bar color
  2. main – title
  3. xlab – x-axis label
  4. breaks – number of bins
hist(
  mtcars$mpg,
  col = "lightblue",
  main = "Distribution of Miles per Gallon",
  xlab = "Miles per Gallon",
  breaks = 10, #histogram specific 
  probability = TRUE #histogram specific (density instead of count)
)

Histogram Practice

  1. Create a histogram of horsepower (hp).

  2. Customize:

  • Color
  • Title
  • X-axis label
hist(mtcars$hp, 
     col = "purple", 
     main = "Purple Plot",
     xlab = "Horsepower", 
     breaks = 10, 
     probability = F
     )

Boxplots

What is a Boxplot?

A boxplot summarizes data using: 1. Median 2. Quartiles 3. Range 4. Outliers

Boxplots are useful for:

  1. Comparing groups
  2. Identifying outliers
  3. Viewing spread quickly
boxplot(mtcars$mpg)

Boxplots with Labels

boxplot(
  mtcars$mpg,
  main = "Boxplot of Miles per Gallon",
  ylab = "Miles per Gallon",
  col = "lightgreen"
)

Comparing Groups with Boxplots

boxplot(
  mpg ~ cyl,
  data = mtcars,
  xlab = "Number of Cylinders",
  ylab = "Miles per Gallon",
  main = "MPG by Cylinder Count",
  col = "orange"
)

mtcars$base10_hp=mtcars$hp/10


boxplot(
  mpg ~ cyl,
  data = mtcars,
  xlab = "Number of Cylinders",
  ylab = "Miles per Gallon",
  main = "MPG by Cylinder Count",
  col = c("orange", "pink", "blue"), # universal
  outcol= "red", # boxplot specific
  horizontal = TRUE, #boxplot specific ## CHECK YOUR AXES!!!!
  names = c("Two", "Four", "Six")
  ) 
#lower level functions
  legend("topright", legend = c("4 cyliner", "6 cyliner", "8 cylinder"), fill = c("orange", "pink", "blue"))

grid()

Boxplot Practice

Create a boxplot comparing horsepower (hp) across cylinder groups (cyl).

Customize: - Title - Axis labels - Color

boxplot(hp ~ cyl, 
        data = mtcars,
        main = "Horsepower by Cylinder Group",
        xlab = "Number of Cylinders",
        ylab = "Horsepower",
        col = "lightgreen")

Barplots

What is a Barplot?

Barplots are used for categorical data or summarized counts.

They show:

  1. Frequencies
  2. Totals
  3. Group comparisons
# First, count how many cars have each cylinder number.

cyl_counts = table(mtcars$cyl)
cyl_counts
## 
##  4  6  8 
## 11  7 14
barplot(cyl_counts)

Customizing a Barplot

barplot(
  cyl_counts,
  col = "purple",
  main = "Number of Cars by Cylinder Count",
  xlab = "Cylinders",
  ylab = "Number of Cars"
)

barplot(
  cyl_counts,
  col = "orange",
  main = "Number of Cars by Cylinder Count",
  xlab = "Cylinders",
  ylab = "Number of Cars",
  border = "blue",
  lwd= 4,
  cex.main = 2, #universal
  cex.lab= 1.5, # universal
  las= 2, #barplot specific
  space= 3 #barplot specific
  )

Barplot Practice

Barplot Practice

Create a barplot showing how many cars fall into each gear category (gear).

Steps: 1. Use table() 2. Use barplot() 3. Add labels and color

barplot(table(mtcars$gear),
        main = "Number of Cars by Gear Category",
        xlab = "Gear",
        ylab = "Number of Cars",
        col = "pink")

Summary

You now know how to create:

  1. Histograms → distributions of numeric data
  2. Boxplots → summaries and group comparisons
  3. Barplots → categorical counts

Together with scatterplots and line plots, these give you a powerful toolkit for visualizing data in R.

Homework

In this assignment, you will create and customize three types of plots using base R:

  1. Histogram
  2. Boxplot
  3. Barplot

You will use the built-in mtcars dataset unless otherwise specified.

All plots must include:

Part 1: Histogram (Distribution of a Numeric Variable)

A histogram displays the distribution of a single numeric variable.

Create a histogram of one numeric variable from mtcars (for example: mpg, hp, or wt).

Your histogram must include:

  • Custom color
  • Title
  • X-axis label
  • Custom number of bins using breaks
hist(mtcars$mpg,
     main = "Distribution of Miles Per Gallon",
     xlab = "Miles Per Gallon (mpg)",
     col = "coral",
     breaks = 10)

Questions:

  1. Is the distribution symmetric, skewed, or approximately normal?

  2. Are there any noticeable gaps or clusters?


Part 2: Boxplot (Comparing Groups)

A boxplot summarizes a distribution using:

  • Median
  • Quartiles
  • Potential outliers

Create a boxplot comparing a numeric variable across groups.

Example: Compare mpg by number of cylinders (cyl).

Your boxplot must include:

  • Group comparison formula format (y ~ x)
  • Title
  • Axis labels
  • Custom color
boxplot(mpg ~ cyl,
        data = mtcars,
        main = "Miles Per Gallon by Cylinder Group",
        xlab = "Number of Cylinders",
        ylab = "Miles Per Gallon (mpg)",
        col = "red")

Questions

  1. Which group has the highest median?

  2. Do any groups appear to have greater variability?

  3. Are there visible outliers?


Part 3: Barplot (Categorical Counts)

A barplot displays counts or summarized categorical data.

  1. Use table() to count frequencies of a categorical variable (gear or cyl).
  2. Create a barplot of those counts.
  3. Customize the plot with:
  • Color
  • Title
  • Axis labels
# Step 1: Create a table

# Example:
# cyl_counts <- table(mtcars$cyl)

# Step 2: Create the barplot

# Example:
# barplot(cyl_counts,
#         col = "purple",
#         main = "Number of Cars by Cylinder Count",
#         xlab = "Cylinders",
#         ylab = "Number of Cars")

Questions

  1. Which category has the highest count?

  2. What does this tell you about the dataset?