Question: How do you make a boxplot in base R?

Box plots are a visual way of displaying a dataset that allow us to easily see a 5 number summary of the data (minimum, maximum, mean, first quartile, third quartile).

Data

We will use the palmerpenguins package for this example. Comment out install.packages after successful installment and load the penguins data.

#install.packages("palmerpenguins")
library(palmerpenguins)
## Warning: package 'palmerpenguins' was built under R version 4.1.2
data(penguins)

Summary

First, letโ€™s take a summary of the body_mass_g coloumn in the penguins dataset.

summary(penguins$body_mass_g)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    2700    3550    4050    4202    4750    6300       2

Plot

To make a simple dotplot of the body mass coloumn in base R, use the boxplot function. This will visulaize the summary we found.

boxplot(penguins$body_mass_g,
        ylab = "Body Mass (g)")

The dark line in the middle of the box is representative of the mean (4202g), while the line below and above represent the first and third quartile. The lines at the poles of the plot are representative of the minimum and maximum values. In many box plots, these values only go up to 1.5 times the interquartile range (it is possible to change the range through arguments). Any data beyond that point is an outlier and expressed with circles. The body mass data does not contain any outliers.

We can also make the plot horizontal, however we will have to change the label to the x axis then. Color can also be added with the col argument to make the plot more visually appealing.

boxplot(x = penguins$body_mass_g, 
        xlab = "Body Mass (g)",
        horizontal = TRUE,
        col = 3
)

Additional Readings

For more information on this topic, see https://r-coder.com/dot-plot-r/

Keywords

1.dotplot 2.box-and-whiskers plot 3.mean 4.minimum 5.maximum 6.first quartile 7.third quartile 8.range 9.outlier 10.boxplot() 11.summary()