A Measure of Central Tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data. In statistics, we use these measures to summarize a large dataset into a single representative figure.
The three most common measures are: 1. The Mean 2. The Median 3. The Mode
The mean (or average) is the sum of all observations divided by the total number of observations.
For a sample of size \(n\), the mean (\(\bar{x}\)) is calculated as:
\[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}\]
Where: - \(\sum\) = The Greek letter ‘sigma’, meaning “sum of” - \(x_i\) = Each individual value in the dataset - \(n\) = The total number of values
Suppose we have the test scores of 5 students: 75, 82, 90, 65, and 88.
scores <- c(75, 82, 90, 65, 88)
mean_score <- mean(scores)
print(paste("The Mean score is:", mean_score))
## [1] "The Mean score is: 80"
A professor uses the mean to determine the average performance of a class. If the mean is very low, the professor might consider “curving” the grades or re-teaching a specific topic.
The median is the middle value when a dataset is ordered from smallest to largest.
The position of the median is: \[Pos = \frac{n + 1}{2}\]
# Odd number of observations
data_odd <- c(10, 3, 5, 8, 2) # Sorted: 2, 3, 5, 8, 10
median(data_odd)
## [1] 5
# Even number of observations
data_even <- c(10, 3, 5, 8, 2, 12) # Sorted: 2, 3, 5, 8, 10, 12 -> (5+8)/2
median(data_even)
## [1] 6.5
The median is the preferred measure for House Prices. Because a single $10,000,000 mansion would drastically inflate the Mean price of a neighborhood, the Median provides a more realistic view of what a “typical” buyer can afford.
The mode is the value that occurs most frequently in a dataset.
Note: R does not have a built-in function for the mode, so we often use a frequency table.
colors <- c("Red", "Blue", "Red", "Green", "Red", "Blue")
table(colors) # Shows 'Red' is the mode
## colors
## Blue Green Red
## 2 1 3
A shoe store manager uses the Mode to decide which shoe size to stock most. Knowing the “average” (Mean) shoe size is 8.4 is useless (you can’t buy size 8.4), but knowing that Size 9 is the Mode helps with purchasing decisions.
The distribution of data (skewness) determines which measure is most appropriate.
| Shape of Distribution | Relationship |
|---|---|
| Symmetrical | Mean = Median = Mode |
| Right Skewed (Positive) | Mean > Median > Mode |
| Left Skewed (Negative) | Mean < Median < Mode |
Watch how the Mean moves much more than the Median when we add an outlier (an extreme value).
# Original data
salary <- c(30, 32, 35, 38, 40) # in thousands
cat("Original Mean:", mean(salary), "| Original Median:", median(salary), "\n")
## Original Mean: 35 | Original Median: 35
# Adding a 'Millionaire' outlier
salary_with_outlier <- c(30, 32, 35, 38, 40, 1000)
cat("New Mean:", mean(salary_with_outlier), "| New Median:", median(salary_with_outlier))
## New Mean: 195.8333 | New Median: 36.5
Dataset: 12, 15, 12, 18, 22, 25, 12, 90.