A Measure of Central Tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data. In statistics, the three most common measures are the Mean, Median, and Mode.
The mean (or average) is the sum of all observations divided by the total number of observations.
For a sample of size \(n\), the sample mean \(\bar{x}\) is: \[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} = \frac{x_1 + x_2 + ... + x_n}{n}\]
Where: * \(\sum\) is the summation symbol. * \(x_i\) represents each individual value. * \(n\) is the total number of values in the sample.
Scenario: A tech startup tracks the number of hours five employees worked in a day: 8, 9, 7, 10, and 6 hours.
# Data vector
work_hours <- c(8, 9, 7, 10, 6)
# Calculating Mean
mean_val <- mean(work_hours)
print(paste("The mean work hours are:", mean_val))
## [1] "The mean work hours are: 8"
The median is the middle value in a data set when the numbers are arranged in ascending or descending order. It is robust to outliers.
Scenario: Monthly house rents in a neighborhood: $1200, $1250, $1300, $1400, and $5000 (an outlier).
rents <- c(1200, 1250, 1300, 1400, 5000)
# Mean vs Median comparison
print(paste("Mean Rent:", mean(rents)))
## [1] "Mean Rent: 2030"
print(paste("Median Rent:", median(rents)))
## [1] "Median Rent: 1300"
Observation: Notice how the mean is pulled upward by the $5000 rent, while the median stays representative of the “typical” house.
The mode is the value that appears most frequently in a data set. A set can be unimodal (one mode), bimodal (two modes), or multimodal.
Scenario: A shoe store records the sizes of sneakers sold in an hour: 7, 8, 8, 9, 10, 8, 11.
Note: R does not have a standard built-in function for the
statistical mode, so we use the table function or a custom
function.
shoe_sizes <- c(7, 8, 8, 9, 10, 8, 11)
# Using table to find frequency
freq_table <- table(shoe_sizes)
mode_val <- names(freq_table)[which.max(freq_table)]
print(freq_table)
## shoe_sizes
## 7 8 9 10 11
## 1 3 1 1 1
print(paste("The Mode shoe size is:", mode_val))
## [1] "The Mode shoe size is: 8"
| Measure | Best Used For… | Sensitivity to Outliers |
|---|---|---|
| Mean | Continuous data with a symmetric distribution (e.g., Height). | Highly Sensitive |
| Median | Skewed data or data with outliers (e.g., Income). | Resistant |
| Mode | Categorical/Nominal data (e.g., Favorite color). | Resistant |
Let’s visualize where these measures sit on a distribution using a generated dataset of exam scores.
# Generate random data
set.seed(123)
scores <- rgamma(100, shape = 2, scale = 10) # Right-skewed distribution
m_mean <- mean(scores)
m_median <- median(scores)
# Plotting
hist(scores, col="lightblue", main="Distribution of Exam Scores", xlab="Score")
abline(v = m_mean, col = "red", lwd = 2, lty = 1)
abline(v = m_median, col = "blue", lwd = 2, lty = 2)
legend("topright", legend=c("Mean", "Median"),
col=c("red", "blue"), lwd=2, lty=1:2)
$...$ for
inline math and $$...$$ for block equations.rmarkdown package:
install.packages("rmarkdown").