1. Introduction

A Measure of Central Tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data. In statistics, it is often referred to as a “summary statistic.”

The three most common measures are: 1. Mean 2. Median 3. Mode

2. The Arithmetic Mean

The mean is the sum of all values divided by the total number of values. It is the most common measure of central tendency.

Mathematical Formula

For a sample, the mean ($\bar{x}$) is: \[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}\]

R Example

# Define the dataset
data <- c(5, 8, 12, 15, 10)

# Calculate mean
sample_mean <- mean(data)
print(paste("The mean is:", sample_mean))

## [1] "The mean is: 10"

Real-Life Application: Academic Grading: A teacher calculates the average score of a class exam to determine the overall performance level.

3. The Median

The median is the middle value in a dataset when the values are arranged in order.

Calculation Logic

Arrange data in ascending order.
If $n$ is odd, the median is the middle number.
If $n$ is even, the median is the average of the two middle numbers.

R Example

# Dataset with an odd number of values
data_odd <- c(3, 10, 2, 8, 15)
print(paste("Median (Odd):", median(data_odd)))

## [1] "Median (Odd): 8"

# Dataset with an even number of values
data_even <- c(3, 10, 2, 8, 15, 20)
print(paste("Median (Even):", median(data_even)))

## [1] "Median (Even): 9"

Real-Life Application: Real Estate: Median home prices are used to avoid the distortion caused by a few extremely expensive mansions (outliers).

4. The Mode

The mode is the value that appears most frequently.

Note: R does not have a built-in function for the statistical mode. We can create a custom function to find it.

R Example

# Custom function to calculate mode
get_mode <- function(v) {
  uniqv <- unique(v)
  uniqv[which.max(tabulate(match(v, uniqv)))]
}

# Dataset
data_mode <- c(2, 4, 4, 6, 8, 9)
print(paste("The mode is:", get_mode(data_mode)))

## [1] "The mode is: 4"

Real-Life Application: Inventory: A shoe store tracks the “mode” of shoe sizes sold to know which size to restock most often.

5. Comparison of Measures

Data Type	Best Measure	Why?
Nominal (Categories)	Mode	You cannot average “Colors” or “Names.”
Ordinal (Ranked)	Median	Distance between ranks is not equal.
Interval/Ratio (No Outliers)	Mean	Most precise; uses all data.
Interval/Ratio (With Outliers)	Median	Not affected by extreme values.

6. Practice Exercise: The Impact of Outliers

A startup has 5 employees with salaries: $40k, $42k, $45k, $48k, and $250k (CEO).

salaries <- c(40, 42, 45, 48, 250)

# Calculate both
mean_sal <- mean(salaries)
med_sal <- median(salaries)

cat("Mean Salary: $", mean_sal, "k\n")

## Mean Salary: $ 85 k

cat("Median Salary: $", med_sal, "k\n")

## Median Salary: $ 45 k

Conclusion: The Mean ($85k) is misleading because of the CEO’s high salary. The Median ($45k) better represents the “typical” employee’s earnings. ```

Module 1: Measures of Central Tendency

Statistics Department

2025-12-28

1. Introduction

2. The Arithmetic Mean

Mathematical Formula

R Example

3. The Median

Calculation Logic

R Example

4. The Mode

R Example

5. Comparison of Measures

6. Practice Exercise: The Impact of Outliers