Boxplots, Histograms, and Barplots in Base R

These plot types help us understand distributions and group comparisons, rather than relationships over time or between variables.


Histograms

What is a Histogram?

A histogram shows the distribution of a single numeric variable.

It helps answer questions like:

  • How are values spread out?
  • Is the data skewed?
  • Are there multiple peaks?

Example Histogram

We will look at miles per gallon (mpg) from mtcars.

hist(mtcars$mpg)

Common options:

  1. col – bar color
  2. main – title
  3. xlab – x-axis label
  4. breaks – number of bins
hist(
  mtcars$mpg,
  col = "lightblue",
  main = "Distribution of Miles per Gallon",
  xlab = "Miles per Gallon",
  breaks = 10, #histogram specific 
  probability = TRUE #histogram specific (density instead of count)
)

Histogram Practice

  1. Create a histogram of horsepower (hp).

  2. Customize:

  • Color
  • Title
  • X-axis label
mtcars
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
#I made it look wonky on purpose, don't hurt me :D
#Adding this many bins is a bad idea; it's hard to estimate what the range of each bin is.
hist(
  mtcars$hp,
  col="red",
  main="Car Horsepower",
  xlab="horsepower",
  breaks=69,
  probability=TRUE
  )

Boxplots

What is a Boxplot?

A boxplot summarizes data using: 1. Median 2. Quartiles 3. Range 4. Outliers

Boxplots are useful for:

  1. Comparing groups
  2. Identifying outliers
  3. Viewing spread quickly
boxplot(mtcars$mpg)

Boxplots with Labels

boxplot(
  mtcars$mpg,
  main = "Boxplot of Miles per Gallon",
  ylab = "Miles per Gallon",
  col = "lightgreen"
)

Comparing Groups with Boxplots

boxplot(
  mpg ~ cyl,
  data = mtcars,
  xlab = "Number of Cylinders",
  ylab = "Miles per Gallon",
  main = "MPG by Cylinder Count",
  col = "orange"
)

boxplot(
  mpg ~ cyl,
  data = mtcars,
  xlab = "Number of Cylinders",
  ylab = "Miles per Gallon",
  main = "MPG by Cylinder Count",
  col = c("orange", "pink", "blue"), # universal
  outcol= "red", # boxplot specific
  horizontal = TRUE #boxplot specific ## CHECK YOUR AXES!!!!
) 

#lower level functions
  legend("topright", legend = c("4 cylinder", "6 cyliner", "8 cylinder"), fill = c("orange", "pink", "blue"))

grid()

Boxplot Practice

Create a boxplot comparing horsepower (hp) across cylinder groups (cyl).

Customize: - Title - Axis labels - Color

boxplot(
  hp ~ cyl,
  data = mtcars,
  xlab="Cylinder numbers",
  ylab="Horsepower",
  main="Horsepower across cylinders",
  col=c("red","darkblue","gray60"),
  outcol="magenta",
  horizontal=TRUE
)

legend("topright", legend=c("4 cylinder", "6 cylinder", "8 cylinder"), fill=c("red","darkblue","gray60"))

grid()

Barplots

What is a Barplot?

Barplots are used for categorical data or summarized counts.

They show:

  1. Frequencies
  2. Totals
  3. Group comparisons
# First, count how many cars have each cylinder number.

cyl_counts <- table(mtcars$cyl)
cyl_counts
## 
##  4  6  8 
## 11  7 14
barplot(cyl_counts)

Customizing a Barplot

barplot(
  cyl_counts,
  col = "purple",
  main = "Number of Cars by Cylinder Count",
  xlab = "Cylinders",
  ylab = "Number of Cars"
)

barplot(
  cyl_counts,
  col = "orange",
  main = "Number of Cars by Cylinder Count",
  xlab = "Cylinders",
  ylab = "Number of Cars",
  border = "blue",
  lwd= 2,
  cex.main = 2, #universal
  cex.lab= 1.5, # universal
  las= 2, #barplot specific
  space= 0.5 #barplot specific
  )

Barplot Practice

Barplot Practice

Create a barplot showing how many cars fall into each gear category (gear).

Steps: 1. Use table() 2. Use barplot() 3. Add labels and color

gearCounts <- table(mtcars$gear)

barplot(
  gearCounts,
  col="cyan",
  main="Car Gear Count Frequency",
  xlab="Gears",
  ylab="Number of Cars",
  border="magenta",
  lwd=6,
  cex.main=0.6,
  cex.lab=1.5,
  las=1,
  space=3.3
  
)

Summary

You now know how to create:

  1. Histograms → distributions of numeric data
  2. Boxplots → summaries and group comparisons
  3. Barplots → categorical counts

Together with scatterplots and line plots, these give you a powerful toolkit for visualizing data in R.

Homework

In this assignment, you will create and customize three types of plots using base R:

  1. Histogram
  2. Boxplot
  3. Barplot

You will use the built-in mtcars dataset unless otherwise specified.

All plots must include:

Part 1: Histogram (Distribution of a Numeric Variable)

A histogram displays the distribution of a single numeric variable.

Create a histogram of one numeric variable from mtcars (for example: mpg, hp, or wt).

Your histogram must include:

  • Custom color
  • Title
  • X-axis label
  • Custom number of bins using breaks
# I'm going to make a histogram of weight.
mtcars
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
hist(
  mtcars$wt,
  col="darkred",
  main="Weight of Cars",
  xlab="Car weights (in tons?)",
  breaks=5 ,
  probability=FALSE
)

grid()

# Is there a way to manually extend the y-axis? One of the bars goes slightly above the y-axis and it bothers me.

Questions:

  1. Is the distribution symmetric, skewed, or approximately normal? The distribution is definitely skewed.

  2. Are there any noticeable gaps or clusters? There are a LOT of 3-4 ton(?) cars compared to the other groups. In comparison, it looks like there’s only one 4-5 ton car.


Part 2: Boxplot (Comparing Groups)

A boxplot summarizes a distribution using:

  • Median
  • Quartiles
  • Potential outliers

Create a boxplot comparing a numeric variable across groups.

Example: Compare mpg by number of cylinders (cyl).

Your boxplot must include:

  • Group comparison formula format (y ~ x)
  • Title
  • Axis labels
  • Custom color
# I'm going to compare weight by number of gears.

boxplot(
  wt ~ gear ,
  data=mtcars,
  horizontal=TRUE,
  main="Car Weights By Gear",
  xlab="Weight (in tons?)",
  ylab="Number of gears",
  col= c("gold","gray50","#CD7F32"),
  outcol="red"
)

grid()

Questions

  1. Which group has the highest median? Three-gear cars.

  2. Do any groups appear to have greater variability? Five-gear cars seem to have the highest variability.

  3. Are there visible outliers? Yes: three-gear cars have four outliers.


Part 3: Barplot (Categorical Counts)

A barplot displays counts or summarized categorical data.

  1. Use table() to count frequencies of a categorical variable (gear or cyl).
  2. Create a barplot of those counts.
  3. Customize the plot with:
  • Color
  • Title
  • Axis labels
# Step 1: Create a table

# Example:
# cyl_counts <- table(mtcars$cyl)

# Step 2: Create the barplot

# Example:
# barplot(cyl_counts,
#         col = "purple",
#         main = "Number of Cars by Cylinder Count",
#         xlab = "Cylinders",
#         ylab = "Number of Cars")

carbCounts <- table(mtcars$carb)

barplot(carbCounts,
  col="darkred",
  main="Number of Cars by Carburetor Count",
  xlab="Carburetors",
  ylab="Number of Cars",
  cex.main=1.5,
  cex.lab=0.75,
  lwd=1.5,
  space=0.3
)

grid()

Questions

  1. Which category has the highest count? Two-carburetor and four-carburetor cars are the most abundant type.

  2. What does this tell you about the dataset? It’s not evenly-distributed, it’s skewed towards certain carburetor counts.