In statistics, a measure of central tendency is a summary measure that attempts to describe a whole set of data with a single value that represents the middle or center of its distribution.
It is often referred to as the “typical” value of the dataset. The three most common measures are: 1. Mean 2. Median 3. Mode
The mean (or average) is the most popular and well-known measure of central tendency. It is calculated by summing all the values in a dataset and dividing by the total number of values.
For a sample of size \(n\), the sample mean \(\bar{x}\) is given by:
\[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}\]
Where: * \(\bar{x}\) = Sample mean * \(x_i\) = The \(i^{th}\) value in the dataset * \(n\) = Number of observations
Scenario: A teacher wants to find the average score of 5 students in a math quiz. Scores: 70, 85, 80, 90, 75.
Calculation: \[\bar{x} = \frac{70 + 85 + 80 + 90 + 75}{5} = \frac{400}{5} = 80\]
scores <- c(70, 85, 80, 90, 75)
mean_score <- mean(scores)
print(paste("The Mean Score is:", mean_score))
## [1] "The Mean Score is: 80"
The median is the middle value in a distribution when the values are arranged in ascending or descending order. It splits the data into two equal halves.
Scenario: Monthly salaries of 5 employees: $2,000, $2,500, $3,000, $3,500, $10,000. (Notice that $10,000 is an outlier).
Calculation: The values are already sorted. Since \(n=5\) (odd), the middle value is the 3rd one. Median = $3,000.
Note: The mean would be $4,200, which doesn’t represent the “typical” employee well because of the outlier. This is why the Median is preferred for skewed data.
salaries <- c(2000, 2500, 3000, 3500, 10000)
median_salary <- median(salaries)
print(paste("The Median Salary is:", median_salary))
## [1] "The Median Salary is: 3000"
The mode is the value that appears most frequently in a dataset. A dataset can have one mode (unimodal), two modes (bimodal), or several modes (multimodal).
There is no specific algebraic formula for the mode; it is identified by frequency counting: \[Mode = \text{Value with highest frequency}\]
Scenario: A shoe store tracks the sizes sold in one hour: 7, 8, 8, 9, 10, 8, 11.
Calculation: * 7: 1 time * 8: 3 times * 9: 1 time * 10: 1 time * 11: 1 time
Mode = 8.
Note: R does not have a standard built-in function for the statistical mode, so we create a simple function.
get_mode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
shoe_sizes <- c(7, 8, 8, 9, 10, 8, 11)
mode_size <- get_mode(shoe_sizes)
print(paste("The Mode Shoe Size is:", mode_size))
## [1] "The Mode Shoe Size is: 8"
| Measure | Best Used For | Sensitivity to Outliers |
|---|---|---|
| Mean | Continuous data, symmetric distribution | High (Outliers pull the mean) |
| Median | Skewed data (e.g., income, house prices) | Low (Robust to outliers) |
| Mode | Categorical data (e.g., most popular car color) | Low |
In a normal distribution, Mean = Median = Mode. In a right-skewed distribution (positive skew): Mean > Median > Mode. In a left-skewed distribution (negative skew): Mean < Median < Mode.
12, 15, 18, 22, 30.3, 10, 2, 8, 15, 12. (Remember to sort first!)Red, Blue, Red, Green, Red, Blue.End of Module I Notes ```
$$ for
professional mathematical rendering of the mean and median logic.mean(),
median(), and a custom get_mode() function to
show students how to perform these calculations computationally.