What does a “typical” data point look like? In statistics, a measure of central tendency is a summary measure that attempts to describe a whole set of data with a single value that represents the middle or center of its distribution.
The three most common measures are: 1. Mean 2. Median 3. Mode
The mean is the sum of all values divided by the total number of values. It is the most common measure of central tendency.
For a sample mean (\(\bar{x}\)): \[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}\]
# Sample data: Exam scores of 10 students
scores <- c(85, 90, 88, 76, 92, 85, 80, 89, 94, 85)
# Calculate Mean
mean_score <- mean(scores)
print(paste("The mean score is:", mean_score))## [1] "The mean score is: 86.4"
A teacher uses the mean to determine the overall performance of a class. If the mean score is low, the teacher might decide to review the material again.
Pros: Uses every value in the dataset. Cons: Highly sensitive to outliers (extreme values).
The median is the middle value when a data set is ordered from least to greatest. If there is an even number of observations, the median is the average of the two middle numbers.
# Using the same scores data
median_score <- median(scores)
print(paste("The median score is:", median_score))## [1] "The median score is: 86.5"
If you are looking at housing prices in a neighborhood where 9 houses cost $300k and 1 mansion costs $10 million, the Mean would be $1.27 million (misleading!), but the Median would remain $300k. This is why the median is used for Household Income reports.
The mode is the value that appears most often in a data set. A data set can have one mode, more than one mode (bimodal/multimodal), or no mode at all.
Note: R does not have a standard built-in function for the statistical mode, so we create a simple custom function.
# Custom function to find the mode
get_mode <- function(x) {
uniqx <- unique(x)
uniqx[which.max(tabulate(match(x, uniqx)))]
}
# Calculate Mode
mode_score <- get_mode(scores)
print(paste("The mode score is:", mode_score))## [1] "The mode score is: 85"
A shoe store manager needs to know which shoe size is the Mode. Knowing the “average” shoe size (e.g., 8.34) is useless for ordering stock, but knowing that Size 9 sells the most (the mode) is vital.
| Measure | Best for… | Sensitive to Outliers? |
|---|---|---|
| Mean | Symmetric data (Normal distribution) | Yes (High) |
| Median | Skewed data (Income, House prices) | No |
| Mode | Categorical data (Colors, Brands, Sizes) | No |
Imagine you are a Data Analyst for a streaming service. You have the
following data representing the number of minutes 10 users spent
watching a show:
minutes <- c(22, 25, 22, 28, 21, 23, 150, 22, 24, 26)
minutes <- c(22, 25, 22, 28, 21, 23, 150, 22, 24, 26)
# Your calculations
avg_min <- mean(minutes)
med_min <- median(minutes)
# Plotting to visualize
hist(minutes, col="skyblue", main="Distribution of Watch Time", xlab="Minutes")
abline(v = avg_min, col="red", lwd=2, lty=2) # Mean in red
abline(v = med_min, col="blue", lwd=2) # Median in blue
legend("topright", legend=c("Mean", "Median"), col=c("red", "blue"), lty=c(2,1), lwd=2)In the next module, we will discuss Measures of Dispersion (Range, Variance, and Standard Deviation) to see how spread out our data is around these centers. ```
File -> New File ->
R Markdown....