A measure of central tendency is a summary statistic that represents the center point or typical value of a dataset. These measures indicate where most values in a distribution fall and are also referred to as the “location” of the data.
The three most common measures are: 1. The Mean 2. The Median 3. The Mode
The mean (or average) is the sum of all values divided by the total number of values.
For a sample of size \(n\), the sample mean \(\bar{x}\) is calculated as:
\[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} = \frac{x_1 + x_2 + ... + x_n}{n}\]
Where: - \(\sum\): Symbol for summation. - \(x_i\): Each individual value in the dataset. - \(n\): The number of observations.
Imagine a small tech startup with five employees earning the following annual salaries (in thousands of dollars): 50, 60, 65, 70, and 75.
salaries <- c(50, 60, 65, 70, 75)
mean_salary <- mean(salaries)
print(paste("The mean salary is:", mean_salary))
## [1] "The mean salary is: 64"
The median is the middle value in a dataset when the values are arranged in ascending or descending order.
Consider the same salaries: 50, 60, 65, 70, 75. The middle value is 65. If we add a CEO earning 300, the set becomes: 50, 60, 65, 70, 75, 300.
salaries_with_ceo <- c(50, 60, 65, 70, 75, 300)
# Mean vs Median
mean_val <- mean(salaries_with_ceo)
median_val <- median(salaries_with_ceo)
cat("Mean with CEO:", mean_val, "\n")
## Mean with CEO: 103.3333
cat("Median with CEO:", median_val)
## Median with CEO: 67.5
Note: The median is “robust” to outliers, whereas the mean is heavily pulled by the CEO’s high salary.
The mode is the value that appears most frequently in a dataset. A distribution can be unimodal (one mode), bimodal (two modes), or multimodal.
A shoe store tracks the sizes sold in one hour: 7, 8, 8, 9, 10, 10, 10, 11. The mode is 10 because it appears three times.
Standard R does not have a built-in function for Mode, so we create a custom function:
get_mode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
shoe_sizes <- c(7, 8, 8, 9, 10, 10, 10, 11)
print(paste("The mode of shoe sizes is:", get_mode(shoe_sizes)))
## [1] "The mode of shoe sizes is: 10"
| Measure | Best for… | Sensitivity to Outliers |
|---|---|---|
| Mean | Symmetric data, Normal distributions | High (Very Sensitive) |
| Median | Skewed data (e.g., Income, Home prices) | Low (Robust) |
| Mode | Categorical data (e.g., Favorite color) | Low |
The following vector represents the daily number of customers at a
local cafe over 10 days:
34, 45, 40, 38, 50, 45, 55, 120, 42, 41.
```
$$ for
professional mathematical rendering.{r} blocks to
demonstrate how to calculate these measures using the R language.mode()
function refers to data storage types, I provided a functional
statistical mode snippet.