March 13, 2023

Quartiles and Interquartile Range (IQR)

In statistics, quartiles divide a data set into 4 equal parts. The first, second, and third quartiles are denoted by \(Q_1\), \(Q_2\), and \(Q_3\).

The interquartile range (IQR) is defined as \[\text{IQR} = Q_3 - Q_1\] where \(Q_3\) is the 75th percentile and \(Q_1\) is the 25th percentile.

The IQR represents where 50% of the data is found and where the median is located in a data set.

Finding each quartile

In order to find each quartile you must

  1. Rearrange the data set in ascending order
  2. The second quartile is the median of the ordered data set.
  3. Divide the data set into two halves - the lower half observation and upper half observations
  4. The first quartile is the median of the lower half observations and the third quartile is the median of the upper half observations.

Box Plots

A box plot (sometimes referred to as a box and whiskers plot) is used in order to show the distribution of data and represent the different quartiles.

The box in the middle of a box plot represents the IQR, which is where 50% of data lies within. The line in the middle of the box represents the median of the data set.

The end of each “whisker” represent the minimum and maximum value in a data set.

Creating Box Plot using ggplot2 and mtcars

ggplot (data = mtcars) +
  geom_boxplot(mapping = aes("var", mpg)) + xlab("") + ylab("Miles per Gallon") + 
  scale_x_discrete(breaks = NULL) +coord_flip()

Understanding mtcars box plot and outliner

From the box plot, we can determine that 50% of the cars in mtcars get an 16-23 miles per gallon. The minimum amount of miles per gallon a car gets is around 11 miles per gallon. The maximum amount of miles per gallon a car gets is around 33 miles per gallon

You’ll notice that there is a dot on the box plot. This dot represent an outlier in the data set, meaning that the observation is distant from the other observations in the data set. Recall that each quartile is determined by the median of the the respective half of the data set, meaning an outlier is significantly distant from the rest of the data set.

Outliers can be determined using the following formulas.

\(\text{Upper Limit} = Q_3 + 1.5(IQR)\)

\(\text{Lower Limit} = Q_1 - 1.5(IQR)\)

Any observation greater than the Upper Limit and smaller than the lower limit is considered an outlier.

Further visualizing outliers using mtcars data set.

As you can see, if a car has 8 cylinders, the car is more likely to have outliers, meaning that there are cars that have an extremely low miles per gallon and extremly high miles per gallon compared to the rest of the data set.

Interactive Box Plot

This interactive box plot gives a five number summary, which consists of the minimum value, \(Q_1\), median, \(Q_3\), and maxium value. Additionally, the box plot gives the value of the outliers.