2024-10-22

Introduction

This is a presentation of my favorite topic in statistics: IQR and Outliers.

What is IQR

IQR stands for the Interquartile Range. It is found given the following formula:

\[ IQR = Q3 - Q1 \]

This formula calculates the range between the first quartile and the third quartile of a dataset.

What are Outliers

Outliers are any point outside of the lower and upper bounds of the dataset.

The lower bound of the dataset is found through the following formula:

\[ Q1 - (1.5 \times IQR) \]

The upper bound of the dataset is found through the following formula:

\[ Q3 + (1.5 \times IQR) \]

How to Use IQR to Find Outliers

  • First, find the median of the dataset. Then, find the median of the lower half of the set or Q1 and the median of the upper half of the set or Q3.

  • Second, calculate the IQR by subtracting Q3 from Q1.

  • Third, multiply the IQR by 1.5.

  • Finally, plug in the answer from the third step to the lower bound and upper bound equations.

Any data point less than the lower bound or greater than the upper bound is considered an outlier.

Code Example 1

Here is an example of how to find the IQR and the outliers in R:

data <- c(5,6,25,3,4,7,5,-2,3,6,10)

iqr <- IQR(data)

q1 <- quantile(data, 0.25)
q3 <- quantile(data, 0.75)

lower <- q1 - (1.5 * iqr)
upper <- q3 + (1.5 * iqr)

outliers <- data[data < lower | data > upper]
outliers
## [1] 25 -2

Boxplot Visualization 1

Here is a boxplot that visualizes the dataset from Code Example 1:

boxplot(data, main="Code Example 1 Boxplot", horizontal=TRUE)

Code Example 2

Here is another example of how to find the IQR and the outliers in R:

data2 <- c(4,23,26,45,32,33,5,78,34,20,21)

iqr2 <- IQR(data2)
iqr2
## [1] 13
q1_2 <- quantile(data2, 0.25)
q3_2 <- quantile(data2, 0.75)

lower2 <- q1_2 - (1.5 * iqr2)
upper2 <- q3_2 + (1.5 * iqr2)

outliers2 <- data2[data2 < lower2 | data2 > upper2]
outliers2
## [1] 78

Boxplot Visualization 2

Here is a boxplot that visualizes the dataset from Code Example 2:

boxplot(data2, main="Code Example 2 Boxplot", horizontal=TRUE)

Comparison

Here are the two boxplots side by side:

Conclusion

Thank you for learning about my favorite statistics concept!