In statistics, a Measurement of Central Tendency is a summary measure that attempts to describe a whole set of data with a single value that represents the middle or center of its distribution.
It is often referred to as the “average” of the data. The three most common measures are: 1. Mean 2. Median 3. Mode
The Mean is the sum of all values divided by the total number of values. It is the most common measure of central tendency.
For a sample of size \(n\), the sample mean \(\bar{x}\) is calculated as: \[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}\]
Where: - \(\sum\) = Summation symbol - \(x_i\) = Each individual value in the dataset - \(n\) = Total number of observations
Scenario: A teacher wants to find the average score of 5 students in a math quiz. Scores: 85, 90, 78, 92, 88.
Manual Calculation: \[\bar{x} = \frac{85 + 90 + 78 + 92 + 88}{5} = \frac{433}{5} = 86.6\]
scores <- c(85, 90, 78, 92, 88)
mean_score <- mean(scores)
print(paste("The mean score is:", mean_score))
## [1] "The mean score is: 86.6"
The Median is the middle value in a dataset when the values are arranged in ascending or descending order. It splits the data into two equal halves.
Scenario: Analyzing household incomes in a neighborhood to avoid the influence of one billionaire (outlier). Incomes (in thousands): 45, 50, 52, 55, 600.
Manual Calculation: The data is already sorted. \(n=5\) (odd). Median = \((\frac{5+1}{2})^{th} = 3^{rd}\) term. Median = 52. (Note: The Mean would be 160.4, which is not representative of the typical neighbor!)
incomes <- c(45, 50, 52, 55, 600)
median_income <- median(incomes)
print(paste("The median income is:", median_income))
## [1] "The median income is: 52"
The Mode is the value that appears most frequently in a dataset. A dataset can have one mode (unimodal), two modes (bimodal), or many (multimodal).
There is no specific algebraic formula for the mode; it is identified by frequency counting: \[\text{Mode} = \text{Value with } \max(\text{frequency})\]
Scenario: A shoe store manager wants to know which shoe size to stock most. Sizes sold: 7, 8, 8, 9, 10, 10, 10, 11, 12.
Manual Calculation: - 7 occurs 1 time - 8 occurs 2 times - 9 occurs 1 time - 10 occurs 3 times - 11 occurs 1 time Mode = 10.
Note: R does not have a built-in function for the statistical
mode (the mode() function returns the data type). We use a
custom function.
get_mode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
shoe_sizes <- c(7, 8, 8, 9, 10, 10, 10, 11, 12)
result <- get_mode(shoe_sizes)
print(paste("The mode of shoe sizes is:", result))
## [1] "The mode of shoe sizes is: 10"
| Measure | When to use? | Sensitivity to Outliers |
|---|---|---|
| Mean | Symmetric data, no extreme outliers. | High (Very sensitive) |
| Median | Skewed data or data with outliers (e.g., Salary). | Low (Robust) |
| Mode | Categorical/Nominal data (e.g., Color, Gender). | Low |
12, 15, 12, 18, 20, 100.100 (outlier) pulls the mean away
from the center compared to the median.practice_data <- c(12, 15, 12, 18, 20, 100)
print(paste("Mean:", mean(practice_data)))
## [1] "Mean: 29.5"
print(paste("Median:", median(practice_data)))
## [1] "Median: 16.5"
```
$...$ and
$$...$$ for professional mathematical notation.