A Measure of Central Tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data. These are often referred to as “summary statistics.”
The three most common measures are: 1. The Mean 2. The Median 3. The Mode
The mean (or average) is the sum of all values divided by the total number of values. It is the most common measure of central tendency.
For a sample of size \(n\), the sample mean \(\bar{x}\) is calculated as:
\[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} = \frac{x_1 + x_2 + ... + x_n}{n}\]
Where: - \(\sum\): Sigma notation (summation) - \(x_i\): The value of each individual observation - \(n\): The total number of observations
Scenario: A tech startup tracks the daily hours worked by a small team of 5 developers: 8, 9, 7, 12, and 8 hours.
Calculation: \[\bar{x} = \frac{8 + 9 + 7 + 12 + 8}{5} = \frac{44}{5} = 8.8 \text{ hours}\]
hours_worked <- c(8, 9, 7, 12, 8)
mean_val <- mean(hours_worked)
print(paste("The mean hours worked is:", mean_val))
## [1] "The mean hours worked is: 8.8"
The median is the middle value in a data set when the values are arranged in ascending or descending order. It is a “robust” measure because it is not affected by extreme outliers.
Scenario: Real estate prices in a neighborhood. Suppose five houses sold for: $250k, $270k, $310k, $320k, and $1.2M (a mansion).
house_prices <- c(250000, 270000, 310000, 320000, 1200000)
median_val <- median(house_prices)
print(paste("The median house price is:", median_val))
## [1] "The median house price is: 310000"
The mode is the value that appears most frequently in a data set. A data set can have one mode (unimodal), two modes (bimodal), or multiple modes (multimodal).
\[\text{Mode} = \text{Value with the highest frequency } (f_i)\]
Scenario: A shoe store owner wants to know which size to restock most. The sizes of the last 10 pairs sold were: 7, 8, 8, 9, 9, 9, 10, 11, 9, 8.
R does not have a built-in function for the mode, but we can create one:
get_mode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
shoe_sizes <- c(7, 8, 8, 9, 9, 9, 10, 11, 9, 8)
mode_val <- get_mode(shoe_sizes)
print(paste("The mode shoe size is:", mode_val))
## [1] "The mode shoe size is: 9"
| Measure | Best Used For | Sensitivity to Outliers |
|---|---|---|
| Mean | Symmetric data, Normal distributions | Highly Sensitive |
| Median | Skewed data (e.g., Income, Prices) | Robust (Not affected) |
| Mode | Categorical data (e.g., Color, Size) | Robust |
End of Module I ```