A Measure of Central Tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data. In statistics, it is often referred to as a “summary statistic.”
The three most common measures are: 1. Mean 2. Median 3. Mode
The mean is the sum of all values divided by the total number of values. It is the most common measure of central tendency.
For a sample, the mean (\(\bar{x}\)) is: \[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}\]
# Define the dataset
data <- c(5, 8, 12, 15, 10)
# Calculate mean
sample_mean <- mean(data)
print(paste("The mean is:", sample_mean))
## [1] "The mean is: 10"
Real-Life Application: Academic Grading: A teacher calculates the average score of a class exam to determine the overall performance level.
The median is the middle value in a dataset when the values are arranged in order.
# Dataset with an odd number of values
data_odd <- c(3, 10, 2, 8, 15)
print(paste("Median (Odd):", median(data_odd)))
## [1] "Median (Odd): 8"
# Dataset with an even number of values
data_even <- c(3, 10, 2, 8, 15, 20)
print(paste("Median (Even):", median(data_even)))
## [1] "Median (Even): 9"
Real-Life Application: Real Estate: Median home prices are used to avoid the distortion caused by a few extremely expensive mansions (outliers).
The mode is the value that appears most frequently.
Note: R does not have a built-in function for the statistical mode. We can create a custom function to find it.
# Custom function to calculate mode
get_mode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
# Dataset
data_mode <- c(2, 4, 4, 6, 8, 9)
print(paste("The mode is:", get_mode(data_mode)))
## [1] "The mode is: 4"
Real-Life Application: Inventory: A shoe store tracks the “mode” of shoe sizes sold to know which size to restock most often.
| Data Type | Best Measure | Why? |
|---|---|---|
| Nominal (Categories) | Mode | You cannot average “Colors” or “Names.” |
| Ordinal (Ranked) | Median | Distance between ranks is not equal. |
| Interval/Ratio (No Outliers) | Mean | Most precise; uses all data. |
| Interval/Ratio (With Outliers) | Median | Not affected by extreme values. |
A startup has 5 employees with salaries: $40k, $42k, $45k, $48k, and $250k (CEO).
salaries <- c(40, 42, 45, 48, 250)
# Calculate both
mean_sal <- mean(salaries)
med_sal <- median(salaries)
cat("Mean Salary: $", mean_sal, "k\n")
## Mean Salary: $ 85 k
cat("Median Salary: $", med_sal, "k\n")
## Median Salary: $ 45 k
Conclusion: The Mean ($85k) is misleading because of the CEO’s high salary. The Median ($45k) better represents the “typical” employee’s earnings. ```