1. Introduction

In statistics, a Measure of Central Tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data. As such, measures of central tendency are sometimes called measures of central location.

The three main measures are:

  1. Mean (The Arithmetic Average)
  2. Median (The Middle Value)
  3. Mode (The Most Frequent Value)

2. The Arithmetic Mean

2.1 Definition

The mean is the most common measure of central tendency. It is the sum of all values divided by the number of observations.

2.2 Mathematical Formula

If we have a dataset \(x\) containing \(n\) values: \(x_1, x_2, ..., x_n\).

The Sample Mean (\(\bar{x}\)): \[ \bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} \]

The Population Mean (\(\mu\)): \[ \mu = \frac{\sum_{i=1}^{N} X_i}{N} \]

2.3 Manual Example

Scenario: A student receives the following quiz scores: 85, 90, 75, 92, 88.

\[ \bar{x} = \frac{85 + 90 + 75 + 92 + 88}{5} \] \[ \bar{x} = \frac{430}{5} = 86 \]

2.4 Real Life Application

  • Economics: Calculating Per Capita Income (Total Income / Total Population).
  • Education: Calculating Grade Point Average (GPA).

2.5 R Example

# Create a vector of quiz scores
quiz_scores <- c(85, 90, 75, 92, 88)

# Calculate the mean
mean_score <- mean(quiz_scores)

print(paste("The mean score is:", mean_score))
## [1] "The mean score is: 86"

3. The Median

3.1 Definition

The median is the middle score for a set of data that has been arranged in order of magnitude (sorted). The median is less affected by outliers (extremely high or low values) than the mean.

3.2 Mathematical Logic

First, sort the data (\(x_{(1)}, x_{(2)}, ..., x_{(n)}\)).

If \(n\) is odd: The median is the value at position: \[ \frac{n+1}{2} \]

If \(n\) is even: The median is the average of the two middle values at positions: \[ \frac{n}{2} \quad \text{and} \quad \frac{n}{2} + 1 \]

3.3 Manual Example

Scenario (Odd number): Family ages: 10, 50, 12, 45, 15. 1. Sort: 10, 12, 15, 45, 50. 2. Middle value is 15.

Scenario (Even number): Test scores: 10, 20, 30, 40. 1. Middle two are 20 and 30. 2. Average: \((20+30)/2 = 25\).

3.4 Real Life Application

  • Real Estate: Median Home Prices are used instead of the Mean because one multi-million dollar mansion would skew the average “typical” house price too high.
  • Salaries: Median household income is used to represent standard of living to avoid skew from billionaires.

3.5 R Example

# Create a vector of home prices (in thousands)
# Note the outlier (2500)
home_prices <- c(300, 350, 280, 320, 2500, 290)

# Calculate Mean and Median to see the difference
mean_price <- mean(home_prices)
median_price <- median(home_prices)

print(paste("Mean Price:", round(mean_price, 2)))
## [1] "Mean Price: 673.33"
print(paste("Median Price:", median_price))
## [1] "Median Price: 310"

Note: The median (310) represents the “typical” house better than the mean (673.33).


4. The Mode

4.1 Definition

The mode is the value that appears most frequently in a data set. A set of data may have one mode (unimodal), two modes (bimodal), or no mode at all.

4.2 Formula

There is no algebraic formula for the mode in raw data; it is determined by counting frequencies.

4.3 Manual Example

Scenario: Shoe sizes sold today: 8, 9, 8, 10, 7, 8, 11. * 7: 1 time * 8: 3 times * 9: 1 time * 10: 1 time * 11: 1 time

The mode is 8.

4.4 Real Life Application

  • Retail/Inventory: A shoe store manager orders more stock based on the Mode (most popular size), not the average size.
  • Voting: The candidate with the most votes (the mode) wins the election.

4.5 R Example

R does not have a built-in function for statistical mode (the mode() function in R checks data storage types). We usually create a custom function or use a table.

# Create a vector of shoe sizes
shoe_sizes <- c(8, 9, 8, 10, 7, 8, 11, 9, 8)

# Using table to find frequencies
freq_table <- table(shoe_sizes)
print(freq_table)
## shoe_sizes
##  7  8  9 10 11 
##  1  4  2  1  1
# Sorting to find the max
sorted_freq <- sort(freq_table, decreasing = TRUE)
print(paste("The most common shoe size (Mode) is:", names(sorted_freq)[1]))
## [1] "The most common shoe size (Mode) is: 8"

5. Summary Comparison

Measure Definition Best Used When…
Mean Average Data is symmetrical (Normal Distribution) and has no outliers.
Median Middle Point Data is skewed or has outliers (e.g., Income, Property Prices).
Mode Most Frequent Data is categorical (Nominal) or when finding the “most popular” item.

6. Practice Problem

Consider the following dataset representing the number of hours 10 students studied for an exam:

\[ Data = \{2, 5, 3, 2, 10, 4, 2, 5, 1, 6\} \]

Calculate the Mean, Median, and Mode using R.

hours <- c(2, 5, 3, 2, 10, 4, 2, 5, 1, 6)

# Calculations
calc_mean <- mean(hours)
calc_median <- median(hours)

# Custom function for Mode
get_mode <- function(v) {
   uniqv <- unique(v)
   uniqv[which.max(tabulate(match(v, uniqv)))]
}
calc_mode <- get_mode(hours)

# Output
data.frame(
  Metric = c("Mean", "Median", "Mode"),
  Value = c(calc_mean, calc_median, calc_mode)
)
##   Metric Value
## 1   Mean   4.0
## 2 Median   3.5
## 3   Mode   2.0

```