In statistics, a measure of central tendency is a summary measure that attempts to describe a whole set of data with a single value that represents the middle or center of its distribution.
It is often referred to as a “summary statistic.” The three most common measures are: 1. Mean 2. Median 3. Mode
The mean (or average) is the most popular and well-known measure of central tendency. It is calculated by summing all the values in a data set and dividing by the number of values.
For a sample of size \(n\), the sample mean (\(\bar{x}\)) is calculated as: \[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}\]
For a population of size \(N\), the population mean (\(\mu\)) is: \[\mu = \frac{\sum_{i=1}^{N} X_i}{N}\]
Where: * \(\sum\) = Summation symbol * \(x_i\) = Each individual value * \(n\) or \(N\) = Total number of observations
Data: A student’s test scores are 85, 90, 78, 92, and 88.
Calculation: \[\bar{x} = \frac{85 + 90 + 78 + 92 + 88}{5} = \frac{433}{5} = 86.6\]
The median is the middle value in a data set that has been arranged in ascending or descending order. It splits the data into two equal halves.
Case A (Odd): Scores: 10, 15, 20. * Sorted: 10, 15, 20. Median = 15.
Case B (Even): Scores: 10, 15, 20, 100. * Sorted: 10, 15, 20, 100. * Calculation: \(\frac{15 + 20}{2} = 17.5\).
The mode is the value that appears most frequently in a data set.
Data: 2, 4, 4, 6, 7, 8, 4, 10 * Mode: 4 (it appears three times).
| Feature | Mean | Median | Mode |
|---|---|---|---|
| Data Type | Quantitative | Quantitative / Ordinal | All types |
| Sensitivity to Outliers | High | Low | Low |
| Use Case | Symmetric data | Skewed data | Categorical data |
Below is how we calculate these measures using R.
# Create a sample dataset of employee salaries (in thousands)
salaries <- c(45, 50, 52, 55, 58, 60, 62, 150) # 150 is an outlier
# Calculate Mean
mean_val <- mean(salaries)
# Calculate Median
med_val <- median(salaries)
# Calculate Mode (R does not have a built-in mode function for scalars)
get_mode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
mode_val <- get_mode(salaries)
# Display Results
cat("Mean Salary:", mean_val, "\n")## Mean Salary: 66.5
## Median Salary: 56.5
## Mode Salary: 45
Observation: Notice how the mean ($66.5k) is higher than the median ($56.5k) because of the $150k outlier. In this case, the median is a better representation of the “typical” salary.
Imagine you are analyzing the number of goals scored by a football
team in 10 matches: 0, 1, 1, 2, 2, 2, 3, 3, 4, 12
$...$ for inline and
$$...$$ for block equations.