A measure of central tendency is a summary statistic that represents the center point or typical value of a dataset. These measures indicate where most values in a distribution fall and are also referred to as the “location” of the data.
In this module, we will cover the three primary measures: 1. The Mean 2. The Median 3. The Mode
The mean (or average) is the most common measure of central tendency. It is the sum of all observations divided by the total number of observations.
For a sample of size \(n\), the sample mean \(\bar{x}\) is calculated as:
\[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} = \frac{x_1 + x_2 + \dots + x_n}{n}\]
Where: - \(\sum\): Summation symbol - \(x_i\): Each individual value - \(n\): Total number of values
# Example dataset: Exam scores
scores <- c(85, 90, 78, 92, 71, 88, 95)
# Calculate mean
mean_score <- mean(scores)
print(paste("The mean score is:", round(mean_score, 2)))
## [1] "The mean score is: 85.57"
Corporate Salaries: If a small startup has 5 employees earning $40k, $45k, $50k, $52k, and $100k, the mean salary is $57,400. Note how the high salary of one individual pulls the mean upward.
The median is the middle value in a dataset when the values are arranged in ascending or descending order. It splits the data into two equal halves.
# Dataset
heights <- c(160, 165, 170, 175, 180, 185)
# Calculate median
med_height <- median(heights)
print(paste("The median height is:", med_height))
## [1] "The median height is: 172.5"
Real Estate: When reporting “Median Home Prices,” economists prefer the median over the mean because it isn’t skewed by a few multi-million dollar mansions. It represents what a “typical” buyer might pay.
The mode is the value that appears most frequently in a dataset. A dataset can have: - Unimodal: One mode - Bimodal: Two modes - Multimodal: More than two modes
Base R does not have a built-in function for the mode of a numeric
vector, so we often use a custom function or the table()
function.
# Example dataset: Shoe sizes
shoe_sizes <- c(7, 8, 8, 9, 10, 10, 10, 11, 12)
# Using table to find the frequency
freq_table <- table(shoe_sizes)
mode_value <- names(freq_table)[which.max(freq_table)]
print(paste("The mode shoe size is:", mode_value))
## [1] "The mode shoe size is: 10"
Inventory Management: A shoe store manager uses the mode to decide which size to stock the most. If size 10 is the mode, they will order more of that size than any other.
| Measure | Best Used For | Sensitivity to Outliers |
|---|---|---|
| Mean | Continuous data with a symmetrical distribution | High (Strongly affected) |
| Median | Skewed data or data with outliers | Low (Robust) |
| Mode | Categorical (Nominal) data | Low |
In a skewed distribution: - Right Skew (Positive): Mean > Median > Mode - Left Skew (Negative): Mean < Median < Mode - Symmetric: Mean \(\approx\) Median \(\approx\) Mode
Sometimes, certain values in a dataset contribute more to the final average than others.
\[\bar{x}_w = \frac{\sum_{i=1}^{n} w_i x_i}{\sum_{i=1}^{n} w_i}\]
Where \(w_i\) is the weight assigned to value \(x_i\).
A student’s final grade is based on: - Homework (20%): 95 - Midterm (30%): 80 - Final Exam (50%): 85
grades <- c(95, 80, 85)
weights <- c(0.20, 0.30, 0.50)
weighted_avg <- weighted.mean(grades, weights)
print(paste("The weighted final grade is:", weighted_avg))
## [1] "The weighted final grade is: 85.5"
c(10, 12, 12, 15, 20, 25, 100).