A Measure of Central Tendency is a single value that identifies the central position within a dataset. It “summarizes” the data by finding a typical value around which the data points cluster.
In this module, we will cover the three most common measures: 1. The Arithmetic Mean 2. The Median 3. The Mode
The Mean is the most common measure of central tendency, often referred to as the “average.”
For a sample of \(n\) observations (\(x_1, x_2, \dots, x_n\)): \[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}\]
For a population of \(N\) observations: \[\mu = \frac{\sum_{i=1}^{N} X_i}{N}\]
# Dataset: Weekly hours spent studying by 8 students
study_hours <- c(12, 15, 14, 18, 20, 15, 10, 22)
# Calculating the mean using base R
mean_val <- mean(study_hours)
print(paste("The mean study hours are:", mean_val))## [1] "The mean study hours are: 15.75"
In Marketing, analysts use the mean to calculate “Average Revenue Per User” (ARPU). This helps businesses understand how much money they generate on average from each customer to decide their advertising budget.
The Median is the middle value when a dataset is arranged in order. It is unique because it is robust, meaning it is not heavily influenced by extreme outliers.
# Dataset: Salaries of 5 employees (in thousands)
# Note the extreme outlier (250) representing the CEO
salaries <- c(45, 50, 60, 55, 250)
# Compare Mean vs Median
cat("Mean Salary:", mean(salaries), "\n")## Mean Salary: 92
## Median Salary: 55
In Real Estate, the “Median House Price” is preferred over the mean. If a neighborhood has five houses priced at $200k and one mansion priced at $5 million, the mean would suggest the “average” house costs $1 million—which is misleading. The median stays near $200k, accurately reflecting what a typical homebuyer pays.
The Mode is the value that appears most frequently in a dataset.
R does not have a standard built-in mode() function for
statistics (the function mode() in R actually checks the
data type). We use the table() function instead.
# Dataset: Popular car colors sold
colors <- c("Red", "Blue", "Blue", "White", "Red", "Blue", "Silver")
# Create a frequency table and find the max
color_table <- table(colors)
modal_value <- names(color_table)[which.max(color_table)]
print(color_table)## colors
## Blue Red Silver White
## 3 2 1 1
## [1] "The mode is: Blue"
A Retail Manager uses the mode to determine which shoe size to stock. Knowing the “average” shoe size is 8.4 is not helpful for ordering, but knowing that size 9 is the Mode (most frequently sold) tells the manager exactly what to buy.
| Feature | Mean | Median | Mode |
|---|---|---|---|
| Best Used For | Continuous data (No outliers) | Skewed data / Outliers | Categorical data |
| Sensitive to Outliers? | Yes | No | No |
| Formula | Sum / Count | Middle Value | Most Frequent |
When a distribution is “skewed” (stretched to one side), these three measures separate. - Symmetric: Mean = Median = Mode - Right Skewed: Mean > Median > Mode
In summary, Module I introduces how to find the “center” of data. Choosing between Mean, Median, and Mode depends entirely on the type of data you have and whether your data has outliers.
End of Module I Notes ```