How are These Related?

Formulas of each

  • Mean: \(\displaystyle \bar{x} = \frac{1}{n}\sum_{i=1}^n x_i\)
  • Median: middle sorted value (or average of two middles)
  • n% Trimmed mean: sort the data set, remove lowest & highest n%, then average the rest

Example

Data: 10, 11, 12, 12, 13, 40
- Mean = 16.3
- Median = 12
- 10% trimmed mean \(\approx\) 12.3
The added outlier of 40 pushes the mean quite a bit, but the median and trimmed mean remain stable

plot_ly() %>%
add_boxplot(y = ~x_no_outlier, name = "Without outlier") %>%
add_boxplot(y = ~x, name = "With outlier") %>%
layout(
title = "Boxplots: With and Without Outlier",
yaxis = list(title = "Value")
)

Trimmed Mean Equation

\[ \bar{x}_{\text{trim}} = \frac{1}{n - 2k} \sum_{i = k + 1}^{n - k} x_{(i)} \]