Boxplots, Histograms, and Barplots in Base R

These plot types help us understand distributions and group comparisons, rather than relationships over time or between variables.


Histograms

What is a Histogram?

A histogram shows the distribution of a single numeric variable.

It helps answer questions like:

  • How are values spread out?
  • Is the data skewed?
  • Are there multiple peaks?

Example Histogram

We will look at miles per gallon (mpg) from mtcars.

hist(mtcars$mpg)

Common options:

  1. col – bar color
  2. main – title
  3. xlab – x-axis label
  4. breaks – number of bins
hist(
  mtcars$mpg,
  col = "lightblue",
  main = "Distribution of Miles per Gallon",
  xlab = "Miles per Gallon",
  breaks = 10, #histogram specific 
  probability = TRUE #histogram specific (density instead of count)
)

Histogram Practice

  1. Create a histogram of horsepower (hp).
hist(
  mtcars$hp,
  col = "purple",
  main = "Purple Plot",
  xlab = "horsepower",
  breaks = 10,
  probability = TRUE 
)

  1. Customize:
  • Color
  • Title
  • X-axis label
# Your code here

Boxplots

What is a Boxplot?

A boxplot summarizes data using: 1. Median 2. Quartiles 3. Range 4. Outliers

Boxplots are useful for:

  1. Comparing groups
  2. Identifying outliers
  3. Viewing spread quickly
boxplot(mtcars$mpg)

Boxplots with Labels

boxplot(
  mtcars$mpg,
  main = "Boxplot of Miles per Gallon",
  ylab = "Miles per Gallon",
  col = "lightgreen"
)

Comparing Groups with Boxplots

boxplot(
  mpg ~ cyl,
  data = mtcars,
  xlab = "Number of Cylinders",
  ylab = "Miles per Gallon",
  main = "MPG by Cylinder Count",
  col = "orange"
)

boxplot(
  mpg ~ cyl,
  data = mtcars,
  xlab = "Number of Cylinders",
  ylab = "Miles per Gallon",
  main = "MPG by Cylinder Count",
  col = c("orange", "pink", "blue"), # universal
  outcol= "red", # boxplot specific
  horizontal = TRUE #boxplot specific ## CHECK YOUR AXES!!!!
)

#lower level functions
  legend("topright", legend = c("4 cyliner", "6 cyliner", "8 cylinder"), fill = c("orange", "pink", "blue"))

grid()

Boxplot Practice

Create a boxplot comparing horsepower (hp) across cylinder groups (cyl).

Customize: - Title - Axis labels - Color

boxplot(
  mpg ~ cyl,
  data = mtcars,
  xlab = "Number of Cylinders",
  ylab = "Miles per Gallon",
  main = "MPG by Cylinder Count",
  col = c("purple", "green", "hotpink"), 
  outcol= "blue", 
  horizontal = TRUE 
)
 legend("topright", legend = c("4 cyliner", "6 cyliner", "8 cylinder"), fill = c("purple", "green", "hotpink"))

grid()

Barplots

What is a Barplot?

Barplots are used for categorical data or summarized counts.

They show:

  1. Frequencies
  2. Totals
  3. Group comparisons
# First, count how many cars have each cylinder number.

cyl_counts <- table(mtcars$cyl)
cyl_counts
## 
##  4  6  8 
## 11  7 14
barplot(cyl_counts)

Customizing a Barplot

barplot(
  cyl_counts,
  col = "purple",
  main = "Number of Cars by Cylinder Count",
  xlab = "Cylinders",
  ylab = "Number of Cars"
)

barplot(
  cyl_counts,
  col = "orange",
  main = "Number of Cars by Cylinder Count",
  xlab = "Cylinders",
  ylab = "Number of Cars",
  border = "blue",
  lwd= 2,
  cex.main = 2, #universal
  cex.lab= 1.5, # universal
  las= 2, #barplot specific
  space= 0.5 #barplot specific
  )

Barplot Practice

Barplot Practice

Create a barplot showing how many cars fall into each gear category (gear).

Steps: 1. Use table() 2. Use barplot() 3. Add labels and color

barplot(
  cyl_counts,
  col = "blue",
  main = "Number of Cars by Gear",
  xlab = "Number of Gears",
  ylab = "Count of Cars",
  border = "pink",
  lwd= 2,
  cex.main = 2, 
  cex.lab= 1.5, 
  las= 2, 
  space= 0.5 
)

Summary

You now know how to create:

  1. Histograms → distributions of numeric data
  2. Boxplots → summaries and group comparisons
  3. Barplots → categorical counts

Together with scatterplots and line plots, these give you a powerful toolkit for visualizing data in R.

Homework

In this assignment, you will create and customize three types of plots using base R:

  1. Histogram
  2. Boxplot
  3. Barplot

You will use the built-in mtcars dataset unless otherwise specified.

All plots must include:

Part 1: Histogram (Distribution of a Numeric Variable)

A histogram displays the distribution of a single numeric variable.

Create a histogram of one numeric variable from mtcars (for example: mpg, hp, or wt).

Your histogram must include:

  • Custom color
  • Title
  • X-axis label
  • Custom number of bins using breaks
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
hist(mtcars$hp)

hist(
   mtcars$hp,
  col = "hotpink",
  main = "Pink plot",
  xlab = "horsepower",
  breaks = 8,
)

Questions:

  1. Is the distribution symmetric, skewed, or approximately normal? It is approximately normal.

  2. Are there any noticeable gaps or clusters? There are no noticable gaps. —

Part 2: Boxplot (Comparing Groups)

A boxplot summarizes a distribution using:

  • Median
  • Quartiles
  • Potential outliers

Create a boxplot comparing a numeric variable across groups.

Example: Compare mpg by number of cylinders (cyl).

Your boxplot must include:

  • Group comparison formula format (y ~ x)
  • Title
  • Axis labels
  • Custom color
boxplot(mtcars$hp)

boxplot(
  hp ~ am,
  data = mtcars,
  xlab =  "Transmission type",
  ylab = "amount of horsepower",
  main = "Amount of horsepower compared to transmission type",
  col = "green",
  outcol = "blue"
)

Questions

  1. Which group has the highest median? Group 1 has the highest median.

  2. Do any groups appear to have greater variability? It appears that group 0 has a greater variability.

  3. Are there visible outliers? Yes, there is visible outliers.


Part 3: Barplot (Categorical Counts)

A barplot displays counts or summarized categorical data.

  1. Use table() to count frequencies of a categorical variable (gear or cyl).
  2. Create a barplot of those counts.
  3. Customize the plot with:
  • Color
  • Title
  • Axis labels
# Step 1: Create a table
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
# Example:
# cyl_counts <- table(mtcars$cyl)
am_counts <- table(mtcars$am)
# Step 2: Create the barplot

# Example:
barplot(am_counts,
        col= "pink",
        main = "Transmission type related to number of cars",
        xlab = "tranmission type",
        ylab = "number of cars")

#barplot( cyl_counts)    
#col = "purple",
#         main = "number of cars by cylinder type",
#         xlab = "cylinders",
#         ylab = "Number of Cars")

Questions

  1. Which category has the highest count? Group 0 has the highest count of cars

  2. What does this tell you about the dataset? This means that this group has the most cars using certain types of transmissions. There must be a popular transmission type in this group.