1. Introduction

A Measure of Central Tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data. In statistics, we use these measures to summarize a large dataset into a single representative figure.

The three most common measures are: 1. The Mean 2. The Median 3. The Mode


2. The Arithmetic Mean

The mean (or average) is the sum of all observations divided by the total number of observations.

Mathematical Formula

For a sample of size \(n\), the mean (\(\bar{x}\)) is calculated as:

\[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}\]

Where: - \(\sum\) = The Greek letter ‘sigma’, meaning “sum of” - \(x_i\) = Each individual value in the dataset - \(n\) = The total number of values

R Example

Suppose we have the test scores of 5 students: 75, 82, 90, 65, and 88.

scores <- c(75, 82, 90, 65, 88)
mean_score <- mean(scores)
print(paste("The Mean score is:", mean_score))
## [1] "The Mean score is: 80"

Real-Life Example: Academic Grading

A professor uses the mean to determine the average performance of a class. If the mean is very low, the professor might consider “curving” the grades or re-teaching a specific topic.


3. The Median

The median is the middle value when a dataset is ordered from smallest to largest.

How to Calculate

  1. Arrange data in ascending order.
  2. If \(n\) is odd, the median is the middle number.
  3. If \(n\) is even, the median is the average of the two middle numbers.

Mathematical Formula (Position)

The position of the median is: \[Pos = \frac{n + 1}{2}\]

R Example

# Odd number of observations
data_odd <- c(10, 3, 5, 8, 2) # Sorted: 2, 3, 5, 8, 10
median(data_odd)
## [1] 5
# Even number of observations
data_even <- c(10, 3, 5, 8, 2, 12) # Sorted: 2, 3, 5, 8, 10, 12 -> (5+8)/2
median(data_even)
## [1] 6.5

Real-Life Example: Real Estate

The median is the preferred measure for House Prices. Because a single $10,000,000 mansion would drastically inflate the Mean price of a neighborhood, the Median provides a more realistic view of what a “typical” buyer can afford.


4. The Mode

The mode is the value that occurs most frequently in a dataset.

  • Unimodal: One mode.
  • Bimodal: Two modes.
  • Multimodal: More than two modes.

Note: R does not have a built-in function for the mode, so we often use a frequency table.

R Example

colors <- c("Red", "Blue", "Red", "Green", "Red", "Blue")
table(colors) # Shows 'Red' is the mode
## colors
##  Blue Green   Red 
##     2     1     3

Real-Life Example: Inventory Management

A shoe store manager uses the Mode to decide which shoe size to stock most. Knowing the “average” (Mean) shoe size is 8.4 is useless (you can’t buy size 8.4), but knowing that Size 9 is the Mode helps with purchasing decisions.


5. Comparing Mean, Median, and Mode

The distribution of data (skewness) determines which measure is most appropriate.

Shape of Distribution Relationship
Symmetrical Mean = Median = Mode
Right Skewed (Positive) Mean > Median > Mode
Left Skewed (Negative) Mean < Median < Mode

Visualizing the Impact of Outliers

Watch how the Mean moves much more than the Median when we add an outlier (an extreme value).

# Original data
salary <- c(30, 32, 35, 38, 40) # in thousands
cat("Original Mean:", mean(salary), "| Original Median:", median(salary), "\n")
## Original Mean: 35 | Original Median: 35
# Adding a 'Millionaire' outlier
salary_with_outlier <- c(30, 32, 35, 38, 40, 1000)
cat("New Mean:", mean(salary_with_outlier), "| New Median:", median(salary_with_outlier))
## New Mean: 195.8333 | New Median: 36.5

6. Summary for Students

  • Use the Mean for symmetric data without outliers.
  • Use the Median when the data has outliers or is skewed (like income).
  • Use the Mode for categorical data (names, colors, sizes).

7. Practice Exercise

Dataset: 12, 15, 12, 18, 22, 25, 12, 90.

  1. Calculate the mean.
  2. Calculate the median.
  3. Identify the mode.
  4. Which measure describes this data best? Why? ```