In statistics, a Measure of Central Tendency is a summary measure that attempts to describe a whole set of data with a single value that represents the middle or center of its distribution.
It is often referred to as a “location” measure because it tells us where the data is localized. The three most common measures are: 1. The Mean 2. The Median 3. The Mode
The mean (or average) is the most popular and well-known measure of central tendency. It is calculated by summing all the values in a data set and dividing by the total number of values.
For a sample of \(n\) values, \(x_1, x_2, \dots, x_n\), the sample mean \(\bar{x}\) is:
\[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}\]
Where: - \(\sum\): Summation symbol. - \(x_i\): Each individual value. - \(n\): Total number of observations.
Scenario: A teacher wants to find the average score of 5 students in a mini-quiz. Scores: 70, 85, 80, 90, 75.
Calculation: \[\bar{x} = \frac{70 + 85 + 80 + 90 + 75}{5} = \frac{400}{5} = 80\]
scores <- c(70, 85, 80, 90, 75)
mean_score <- mean(scores)
print(paste("The mean score is:", mean_score))
## [1] "The mean score is: 80"
The median is the middle value in a data set that has been arranged in numerical order (ascending or descending). It splits the data into two equal halves.
Scenario: Comparing household incomes in a neighborhood to avoid the influence of one billionaire living on the block. Incomes (in thousands): $45, $50, $52, $55, $700.
incomes <- c(45, 50, 52, 55, 700)
median_income <- median(incomes)
print(paste("The median income is:", median_income))
## [1] "The median income is: 52"
The mode is the value that appears most frequently in a data set. A data set can have one mode (unimodal), two modes (bimodal), or many modes (multimodal).
Scenario: A shoe store owner wants to know which shoe size to stock most heavily. Sizes sold: 7, 8, 8, 9, 10, 10, 10, 11.
Note: R does not have a built-in function for the statistical mode, so we use a custom table-based approach.
shoe_sizes <- c(7, 8, 8, 9, 10, 10, 10, 11)
get_mode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
print(paste("The mode shoe size is:", get_mode(shoe_sizes)))
## [1] "The mode shoe size is: 10"
| Measure | Best Used For… | Sensitivity to Outliers |
|---|---|---|
| Mean | Continuous data with a symmetrical distribution (e.g., Height). | High (Easily skewed) |
| Median | Skewed data or data with outliers (e.g., Salaries). | Low (Robust) |
| Mode | Categorical/Nominal data (e.g., Most popular car color). | Low |
Consider the following dataset representing the number of hours 7
students spent studying: 2, 3, 3, 4, 5, 8, 20.
4.3.Observation: The mean (6.42) is higher than the median (4) because it is being pulled up by the outlier (20 hours). In this case, the median provides a more accurate picture of “typical” study time. ```