In statistics, a Measure of Central Tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data. These measures are sometimes called “measures of central location” or “Central Attendance.”
The three most common measures are: 1. The Mean 2. The Median 3. The Mode
The mean is the “average” value. It is calculated by summing all observations and dividing by the total number of observations.
For a sample of size \(n\): \[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}\]
Where: - \(\bar{x}\) = Sample Mean - \(\sum\) = Summation symbol - \(x_i\) = Each individual value - \(n\) = Number of values in the sample
Scenario: A tech company tracks the hours worked by 5 employees in a single day: 8, 9, 7, 10, and 6 hours.
Calculation: \[\bar{x} = \frac{8 + 9 + 7 + 10 + 6}{5} = \frac{40}{5} = 8 \text{ hours}\]
The median is the middle value in a dataset when the values are arranged in ascending or descending order. It is the “50th percentile.”
Scenario: Weekly salaries of 5 workers are $500, $550, $600, $650, and one CEO who earns $10,000.
In this case, the Median is a better representation of “central attendance” because the Mean is skewed by the CEO’s high salary (outlier).
The mode is the value that appears most frequently in a dataset.
There is no specific algebraic formula for the mode; it is determined by frequency (\(f\)): \[\text{Mode} = \text{Value with maximum } f(x)\]
Scenario: A shoe store sells sizes 7, 8, 8, 9, 10, 8, 11. - Size 8 appears 3 times. - Mode: 8.
Note: Base R does not have a built-in mode()
function for statistics, so we create a simple table.
shoe_sizes <- c(7, 8, 8, 9, 10, 8, 11)
freq_table <- table(shoe_sizes)
mode_val <- names(freq_table)[which.max(freq_table)]
print(paste("The mode shoe size is:", mode_val))## [1] "The mode shoe size is: 8"
| Measure | Best for… | Sensitive to Outliers? |
|---|---|---|
| Mean | Symmetric data / Normal distribution | Yes |
| Median | Skewed data (e.g., Income, House prices) | No |
| Mode | Categorical data (e.g., Favorite color) | No |
# Generating a random skewed dataset
set.seed(123)
data <- rgamma(100, shape = 2, scale = 2)
# Plotting
hist(data, col = "lightblue", main = "Distribution of Data", xlab = "Values")
abline(v = mean(data), col = "red", lwd = 2, lty = 1) # Mean in Red
abline(v = median(data), col = "blue", lwd = 2, lty = 2) # Median in Blue
legend("topright", legend = c("Mean", "Median"), col = c("red", "blue"), lty = 1:2)c(21, 23, 21, 25, 22, 29, 21, 50).$$). When you knit this in RStudio, it will
render as professional math symbols.