Boxplots, Histograms, and Barplots in Base R

These plot types help us understand distributions and group comparisons, rather than relationships over time or between variables.


Histograms

What is a Histogram?

A histogram shows the distribution of a single numeric variable. There is no Y value

It helps answer questions like:

  • How are values spread out?
  • Is the data skewed?
  • Are there multiple peaks?

Example Histogram

We will look at miles per gallon (mpg) from mtcars.

hist(mtcars$mpg)

Common options:

  1. col – bar color
  2. main – title
  3. xlab – x-axis label
  4. breaks – number of bins
hist(
  mtcars$mpg,
  col = "lightblue",
  main = "Distribution of Miles per Gallon",
  xlab = "Miles per Gallon",
  breaks = 10, #histogram specific 
  probability = TRUE #histogram specific (density instead of count)
)

Histogram Practice

  1. Create a histogram of horsepower (hp).

  2. Customize:

  • Color
  • Title
  • X-axis label
hist(mtcars$hp,
     col = "lightpink",
     main = "Distribution of Horsepower",
     xlab = "Horsepower",
     )

hist(mtcars$hp,
     col = "lightpink",
     main = "Distribution of Horsepower",
     xlab = "Horsepower",
     probability = TRUE
     )

Boxplots

What is a Boxplot?

A boxplot summarizes data using: 1. Median 2. Quartiles 3. Range 4. Outliers

Boxplots are useful for:

  1. Comparing groups
  2. Identifying outliers
  3. Viewing spread quickly
boxplot(mtcars$mpg)

Boxplots with Labels

boxplot(
  mtcars$mpg,
  main = "Boxplot of Miles per Gallon",
  ylab = "Miles per Gallon",
  col = "lightgreen"
)

Comparing Groups with Boxplots

boxplot(
  mpg ~ cyl,
  data = mtcars,
  xlab = "Number of Cylinders",
  ylab = "Miles per Gallon",
  main = "MPG by Cylinder Count",
  col = "orange"
)

boxplot(
  mpg ~ cyl,
  data = mtcars,
  xlab = "Number of Cylinders",
  ylab = "Miles per Gallon",
  main = "MPG by Cylinder Count",
  col = c("orange", "pink", "blue"), # universal
  outcol= "red", # boxplot specific
  horizontal = TRUE #boxplot specific ## CHECK YOUR AXES!!!!
) 

#lower level functions
  legend("topright", legend = c("4 cyliner", "6 cyliner", "8 cylinder"), fill = c("orange", "pink", "blue"))

grid()

Boxplot Practice

Create a boxplot comparing horsepower (hp) across cylinder groups (cyl).

Customize: - Title - Axis labels - Color

boxplot(
  hp ~ cyl,
  data = mtcars,
  xlab = "Number of Cylinders",
  ylab = "Horsepower",
  main = "HP by Cylinder Count",
  col = "orange"
)

boxplot(
  hp ~ cyl,
  data = mtcars,
  xlab = "Number of Cylinders",
  ylab = "Horsepower",
  main = "HP by Cylinder Count",
  col = c("orange", "pink", "blue"), # universal
  outcol= "red", # boxplot specific
  horizontal = TRUE #boxplot specific ## CHECK YOUR AXES!!!!
)

Barplots

What is a Barplot?

Barplots are used for categorical data or summarized counts.

They show:

  1. Frequencies
  2. Totals
  3. Group comparisons
# First, count how many cars have each cylinder number.

cyl_counts <- table(mtcars$cyl)
cyl_counts
## 
##  4  6  8 
## 11  7 14
barplot(cyl_counts)

Customizing a Barplot

barplot(
  cyl_counts,
  col = "purple",
  main = "Number of Cars by Cylinder Count",
  xlab = "Cylinders",
  ylab = "Number of Cars"
)

barplot(
  cyl_counts,
  col = "orange",
  main = "Number of Cars by Cylinder Count",
  xlab = "Cylinders",
  ylab = "Number of Cars",
  border = "blue",
  lwd= 2,
  cex.main = 2, #universal
  cex.lab= 1.5, # universal
  las= 2, #barplot specific, telling the numbers orientation
  space= 0.5 #barplot specific, space between the bars
  )

Barplot Practice

Barplot Practice

Create a barplot showing how many cars fall into each gear category (gear).

Steps: 1. Use table() 2. Use barplot() 3. Add labels and color

gear_counts <- table(mtcars$gear)

barplot(
  gear_counts,
  col = "orange",
  main = "Number of Cars by Gear Category",
  xlab = "Gears",
  ylab = "Number of Cars",
  border = "blue",
  lwd= 2,
  cex.main = 2, #universal
  cex.lab= 1.5, # universal
  las= 2, #barplot specific, telling the numbers orientation
  space= 0.5 #barplot specific, space between the bars
  )

Summary

You now know how to create:

  1. Histograms → distributions of numeric data
  2. Boxplots → summaries and group comparisons
  3. Barplots → categorical counts

Together with scatterplots and line plots, these give you a powerful toolkit for visualizing data in R.

Homework

In this assignment, you will create and customize three types of plots using base R:

  1. Histogram
  2. Boxplot
  3. Barplot

You will use the built-in mtcars dataset unless otherwise specified.

All plots must include:

Part 1: Histogram (Distribution of a Numeric Variable)

A histogram displays the distribution of a single numeric variable.

Create a histogram of one numeric variable from mtcars (for example: mpg, hp, or wt).

Your histogram must include:

  • Custom color
  • Title
  • X-axis label
  • Custom number of bins using breaks
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
hist(mtcars$hp)

hist(
  mtcars$mpg,
  col = "green",
  main = "Distribution of Horsepower",
  xlab = "Horsepower",
  breaks = 5,
)

Questions:

1. Is the distribution symmetric, skewed, or approximately normal?

The distribution is skewed towards the left

2. Are there any noticeable gaps or clusters?

There is a gap between 25 and 30


Part 2: Boxplot (Comparing Groups)

A boxplot summarizes a distribution using:

  • Median
  • Quartiles
  • Potential outliers

Create a boxplot comparing a numeric variable across groups.

Example: Compare mpg by number of cylinders (cyl).

Your boxplot must include:

  • Group comparison formula format (y ~ x)
  • Title
  • Axis labels
  • Custom color
boxplot(mtcars$hp)

boxplot(
  hp ~ am,
  data = mtcars,
  xlab = "Transmission Type",
  ylab = "Amount of Horsepower",
  main = "Horsepower Compared to Transmission Type",
  col = "pink",
  outcol= "red"
)

Questions

1. Which group has the highest median?

Transmission type 0 which was the manual transmission has the highest amount of horsepower

2. Do any groups appear to have greater variability?

The transmission type automatic has higher variability since they have numbers that range from 50 to ~ 35o when compared to the manual transmission horsepower range of ~60 to ~250.

3. Are there visible outliers?

Yes, there are two for the transmission type 1.


Part 3: Barplot (Categorical Counts)

A barplot displays counts or summarized categorical data.

  1. Use table() to count frequencies of a categorical variable (gear or cyl).
  2. Create a barplot of those counts.
  3. Customize the plot with:
  • Color
  • Title
  • Axis labels
# Step 1: Create a table
am_counts<- table(mtcars$am)


# Step 2: Create the barplot

barplot(am_counts,
        col = "lightblue",
        main = "Number of Cars by Transmission Type",
        xlab = "Transmission Type",
        ylab = "Number of Cars")

# Example:
# barplot(cyl_counts,
#         col = "purple",
#         main = "Number of Cars by Cylinder Count",
#         xlab = "Cylinders",
#         ylab = "Number of Cars")

Questions

1. Which category has the highest count?

Transmission type 0 has the highest count

2. What does this tell you about the dataset?

This tells me that there are more cars with transmission type 0 (manual) than there are transmission type 1 (automatic)