# ----------------------------
# Two groups:
# A = Normal (balanced)
# B = Has an issue (outliers / skew)
# ----------------------------
set.seed(1)
# Group A: normal distribution
group_A <- rnorm(50, mean = 75, sd = 5)
# Group B: mostly normal but with a "problem" (few extreme high outliers)
group_B <- rnorm(50, mean = 75, sd = 5)
group_B[1:5] <- group_B[1:5] + 25 # add extreme values (the "issue")
# Make means (almost) equal by shifting Group B
group_B <- group_B + (mean(group_A) - mean(group_B))
# Check means
mean(group_A)
## [1] 75.50224
mean(group_B)
## [1] 75.50224
# ----------------------------
# Histograms (side-by-side)
# ----------------------------
par(mfrow=c(1,2))
hist(group_A, main="Group A (Normal)", xlab="Scores")
abline(v=mean(group_A), col="red", lwd=2)
hist(group_B, main="Group B (Issue/Outliers)", xlab="Scores")
abline(v=mean(group_B), col="red", lwd=2)
# ----------------------------
# Boxplot
# ----------------------------
par(mfrow=c(1,1))
boxplot(group_A, group_B,
names=c("A (Normal)", "B (Issue)"),
main="Same Mean, Different Reality",
ylab="Scores")
points(c(1,2), c(mean(group_A), mean(group_B)), col="red", pch=19)
Although both student groups share the same mean score, their performance patterns are fundamentally different.
In Group A, the scores are evenly distributed around the mean, which makes the average a reliable summary of performance.
In Group B, a small number of high scores increased the mean, while most students performed below that level.
This demonstrates an important principle:
The mean describes the center. It does not describe the shape.
Statistical interpretation requires looking beyond the number to understand the structure of the data.