In statistics, a Measure of Central Tendency is a summary measure that attempts to describe a whole set of data with a single value that represents the middle or center of its distribution.
It helps us understand the “typical” value in a dataset. The three most common measures are: 1. Mean 2. Median 3. Mode
The mean (often called the average) is the sum of all values divided by the total number of values.
For a sample of size \(n\), the sample mean \(\bar{x}\) is calculated as: \[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}\]
Where: * \(\sum\) is the summation symbol. * \(x_i\) represents each individual value. * \(n\) is the total number of observations.
Scenario: A teacher wants to find the average score of 5 students in a math quiz. Scores: 85, 90, 88, 76, 92.
# Define the dataset
scores <- c(85, 90, 88, 76, 92)
# Calculate mean
mean_score <- mean(scores)
print(paste("The average score is:", mean_score))## [1] "The average score is: 86.2"
Pros: Uses every value in the dataset; mathematically stable. Cons: Highly sensitive to outliers (extreme values).
The median is the middle value in a dataset when the values are arranged in ascending or descending order.
Imagine 5 people earn \(\$30k, \$35k, \$40k, \$45k,\) and \(\$1,000k\) (a millionaire). * Mean: \(\$230k\) (Does not represent the “typical” person). * Median: \(\$40k\) (A much better representation of the group).
The mode is the value that appears most frequently in a dataset. A dataset can be: * Unimodal: One mode. * Bimodal: Two modes. * Multimodal: Three or more modes.
A shoe store wants to know which shoe size to stock most. If they
sold sizes [7, 8, 8, 8, 9, 10, 11], the mode is
8.
Note: R does not have a built-in function for the statistical
mode (the mode() function in R returns the data type). We
often create a custom function.
get_mode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
shoe_sizes <- c(7, 8, 8, 8, 9, 10, 11)
result <- get_mode(shoe_sizes)
print(paste("The mode of shoe sizes is:", result))## [1] "The mode of shoe sizes is: 8"
The relationship between these measures depends on the Skewness of the data:
# Generating a right-skewed dataset
set.seed(123)
data <- rchisq(1000, df = 5)
hist(data, col="skyblue", main="Right Skewed Distribution", xlab="Values")
abline(v = mean(data), col = "red", lwd = 2, lty = 1)
abline(v = median(data), col = "blue", lwd = 2, lty = 2)
legend("topright", legend=c("Mean", "Median"), col=c("red", "blue"), lty=1:2, lwd=2)| Measure | Best Used For… | Sensitive to Outliers? |
|---|---|---|
| Mean | Continuous data, symmetrical distributions | Yes |
| Median | Skewed data (Income, House Prices) | No |
| Mode | Categorical data (Colors, Brands) | No |