Same mean. Different reality

# ----------------------------
# Two groups:
# A = Normal (balanced)
# B = Has an issue (outliers / skew)
# ----------------------------
set.seed(1)

# Group A: normal distribution
group_A <- rnorm(50, mean = 75, sd = 5)

# Group B: mostly normal but with a "problem" (few extreme high outliers)
group_B <- rnorm(50, mean = 75, sd = 5)
group_B[1:5] <- group_B[1:5] + 25   # add extreme values (the "issue")

# Make means (almost) equal by shifting Group B
group_B <- group_B + (mean(group_A) - mean(group_B))

# Check means
mean(group_A)
## [1] 75.50224
mean(group_B)
## [1] 75.50224
# ----------------------------
# Histograms (side-by-side)
# ----------------------------
par(mfrow=c(1,2))

hist(group_A, main="Group A (Normal)", xlab="Scores")
abline(v=mean(group_A), col="red", lwd=2)

hist(group_B, main="Group B (Issue/Outliers)", xlab="Scores")
abline(v=mean(group_B), col="red", lwd=2)

# ----------------------------
# Boxplot
# ----------------------------
par(mfrow=c(1,1))

boxplot(group_A, group_B,
        names=c("A (Normal)", "B (Issue)"),
        main="Same Mean, Different Reality",
        ylab="Scores")

points(c(1,2), c(mean(group_A), mean(group_B)), col="red", pch=19)

Interpretation

Although both student groups share the same mean score, their performance patterns are fundamentally different.

In Group A, the scores are evenly distributed around the mean, which makes the average a reliable summary of performance.

In Group B, a small number of high scores increased the mean, while most students performed below that level.

This demonstrates an important principle:

The mean describes the center. It does not describe the shape.

Statistical interpretation requires looking beyond the number to understand the structure of the data.