1. Introduction

In statistics, a Measure of Central Tendency is a summary measure that attempts to describe a whole set of data with a single value that represents the middle or center of its distribution.

It is often referred to as a “location” measure because it tells us where the data is localized. The three most common measures are: 1. The Mean 2. The Median 3. The Mode


2. The Arithmetic Mean

The mean (or average) is the most popular and well-known measure of central tendency. It is calculated by summing all the values in a data set and dividing by the total number of values.

Mathematical Formula

For a sample of \(n\) values, \(x_1, x_2, \dots, x_n\), the sample mean \(\bar{x}\) is:

\[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}\]

Where: - \(\sum\): Summation symbol. - \(x_i\): Each individual value. - \(n\): Total number of observations.

Real-Life Example

Scenario: A teacher wants to find the average score of 5 students in a mini-quiz. Scores: 70, 85, 80, 90, 75.

Calculation: \[\bar{x} = \frac{70 + 85 + 80 + 90 + 75}{5} = \frac{400}{5} = 80\]

R Implementation

scores <- c(70, 85, 80, 90, 75)
mean_score <- mean(scores)
print(paste("The mean score is:", mean_score))
## [1] "The mean score is: 80"

3. The Median

The median is the middle value in a data set that has been arranged in numerical order (ascending or descending). It splits the data into two equal halves.

Mathematical Formula

  1. Sort the data from smallest to largest.
  2. If \(n\) is odd, the median is the value at position: \[\text{Median} = \left( \frac{n+1}{2} \right)^{th} \text{term}\]
  3. If \(n\) is even, the median is the average of the two middle terms: \[\text{Median} = \frac{(\frac{n}{2})^{th} \text{term} + (\frac{n}{2} + 1)^{th} \text{term}}{2}\]

Real-Life Example

Scenario: Comparing household incomes in a neighborhood to avoid the influence of one billionaire living on the block. Incomes (in thousands): $45, $50, $52, $55, $700.

  • Mean: $180.4 (This is misleading due to the $700 outlier).
  • Median: $52 (A much better representation of the “typical” neighbor).

R Implementation

incomes <- c(45, 50, 52, 55, 700)
median_income <- median(incomes)
print(paste("The median income is:", median_income))
## [1] "The median income is: 52"

4. The Mode

The mode is the value that appears most frequently in a data set. A data set can have one mode (unimodal), two modes (bimodal), or many modes (multimodal).

Real-Life Example

Scenario: A shoe store owner wants to know which shoe size to stock most heavily. Sizes sold: 7, 8, 8, 9, 10, 10, 10, 11.

  • Mode: 10 (Because it occurred 3 times).

R Implementation

Note: R does not have a built-in function for the statistical mode, so we use a custom table-based approach.

shoe_sizes <- c(7, 8, 8, 9, 10, 10, 10, 11)

get_mode <- function(v) {
  uniqv <- unique(v)
  uniqv[which.max(tabulate(match(v, uniqv)))]
}

print(paste("The mode shoe size is:", get_mode(shoe_sizes)))
## [1] "The mode shoe size is: 10"

5. Comparison: When to use which?

Measure Best Used For… Sensitivity to Outliers
Mean Continuous data with a symmetrical distribution (e.g., Height). High (Easily skewed)
Median Skewed data or data with outliers (e.g., Salaries). Low (Robust)
Mode Categorical/Nominal data (e.g., Most popular car color). Low

6. Summary Exercise

Consider the following dataset representing the number of hours 7 students spent studying: 2, 3, 3, 4, 5, 8, 20.

  1. Calculate the Mean: \((2+3+3+4+5+8+20)/7 = 6.42\) hours.
  2. Find the Median: The 4th value is 4.
  3. Find the Mode: The most frequent value is 3.

Observation: The mean (6.42) is higher than the median (4) because it is being pulled up by the outlier (20 hours). In this case, the median provides a more accurate picture of “typical” study time. ```