2025-11-14

Overview

  • Topic: Mean vs Median
  • Goal: Understand two measures of tendency
  • Key Motivation
    • Used in exam scores, income, housing prices, sports analytics
    • Mean and median can lead to different interpretations

Definitions: Mean vs Median

Mean (Arithmetic Average)
- Add all values and divide by the number of observations.

\[ \bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i \]

Median
- The “middle” value after sorting the data. - If \(n\) is odd: the median is the single middle value.
- If \(n\) is even: the median is the average of the two middle values.

Key Difference - Mean uses all values and is sensitive to outliers. - Median is more robust when the data are skewed or contain extreme values.

Example Data

values <- c(55, 60, 65, 67, 70, 72, 75, 77, 80, 85, 90, 100)
df <- data.frame(value = values)
mean_value <- mean(values)
median_value <- median(values)

mean_value
## [1] 74.66667
median_value
## [1] 73.5

Boxplot

plot_box <- ggplot(df, aes(x = "", y = value)) +
  geom_boxplot(fill = "steelblue") +
  labs(
    title = "Boxplot of Exam Scores",
    x = "",
    y = "Value"
  )

Boxplot

Histogram

plot_hist <- ggplot(df, aes(x = value)) +
geom_histogram(binwidth = 5, fill = "lightblue", color = "black") +
labs(
title = "Histogram of Data",
x = "Value",
y = "Count"
)

Histogram

Interactive Histogram

plot_interactive <- plot_ly(df, x = ~value, type = "histogram") |>
  layout(
    title = "Interactive Histogram (Plotly)",
    xaxis = list(title = "Value"),
    yaxis = list(title = "Count")
  )

Interactive Histogram

Conclusion

  • Mean and median are both measures of central tendency, but they behave differently.
  • Mean is affected by extreme values because it uses every value in the dataset.
  • Median is more robust and stays close to the majority of the data when distributions are skewed.
  • Always inspect the data visually (boxplots, histograms) before choosing which summary statistic to report.
  • Consider the context:
    • Use median for skewed data like income or property prices.
    • Use mean for symmetric distributions like heights or test scores.