- Data often has outliers that skew data
- These three are methods for estimating center:
- Mean (\(\bar{x}\))
- Median
- Trimmed mean (drop top/bottom few %, then average)
Data: 10, 11, 12, 12, 13, 40
- Mean = 16.3
- Median = 12
- 10% trimmed mean \(\approx\) 12.3
The added outlier of 40 pushes the mean quite a bit, but the median and trimmed mean remain stable
plot_ly() %>% add_boxplot(y = ~x_no_outlier, name = "Without outlier") %>% add_boxplot(y = ~x, name = "With outlier") %>% layout( title = "Boxplots: With and Without Outlier", yaxis = list(title = "Value") )
\[ \bar{x}_{\text{trim}} = \frac{1}{n - 2k} \sum_{i = k + 1}^{n - k} x_{(i)} \]