1. Introduction

A Measure of Central Tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data. These are often referred to as “summary statistics.”

The three most common measures are: 1. The Mean 2. The Median 3. The Mode

2. The Arithmetic Mean

The mean (or average) is the sum of all values divided by the total number of values. It is the most common measure of central tendency.

Mathematical Formula

For a sample of size $n$, the sample mean $\bar{x}$ is calculated as:

\[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} = \frac{x_1 + x_2 + ... + x_n}{n}\]

Where: - $\sum$: Sigma notation (summation) - $x_i$: The value of each individual observation - $n$: The total number of observations

Real-Life Example

Scenario: A tech startup tracks the daily hours worked by a small team of 5 developers: 8, 9, 7, 12, and 8 hours.

Calculation: \[\bar{x} = \frac{8 + 9 + 7 + 12 + 8}{5} = \frac{44}{5} = 8.8 \text{ hours}\]

R Implementation

hours_worked <- c(8, 9, 7, 12, 8)
mean_val <- mean(hours_worked)
print(paste("The mean hours worked is:", mean_val))

## [1] "The mean hours worked is: 8.8"

3. The Median

The median is the middle value in a data set when the values are arranged in ascending or descending order. It is a “robust” measure because it is not affected by extreme outliers.

Mathematical Formula

Arrange data in order: $x_{(1)} \leq x_{(2)} \leq ... \leq x_{(n)}$
If $n$ is odd: \[\text{Median} = x_{(\frac{n+1}{2})}\]
If $n$ is even: \[\text{Median} = \frac{x_{(\frac{n}{2})} + x_{(\frac{n}{2} + 1)}}{2}\]

Real-Life Example

Scenario: Real estate prices in a neighborhood. Suppose five houses sold for: $250k, $270k, $310k, $320k, and $1.2M (a mansion).

Ordered Data: 250, 270, 310, 320, 1200
Median: $310,000.
Note: The Mean would be $470,000, which is misleading because of the $1.2M outlier. The median provides a better “typical” price.

R Implementation

house_prices <- c(250000, 270000, 310000, 320000, 1200000)
median_val <- median(house_prices)
print(paste("The median house price is:", median_val))

## [1] "The median house price is: 310000"

4. The Mode

The mode is the value that appears most frequently in a data set. A data set can have one mode (unimodal), two modes (bimodal), or multiple modes (multimodal).

Mathematical Formula

\[\text{Mode} = \text{Value with the highest frequency } (f_i)\]

Real-Life Example

Scenario: A shoe store owner wants to know which size to restock most. The sizes of the last 10 pairs sold were: 7, 8, 8, 9, 9, 9, 10, 11, 9, 8.

Frequency of 7: 1
Frequency of 8: 3
Frequency of 9: 4
Frequency of 10: 1
Frequency of 11: 1
Mode: Size 9 (it sold the most).

R Implementation

R does not have a built-in function for the mode, but we can create one:

get_mode <- function(v) {
   uniqv <- unique(v)
   uniqv[which.max(tabulate(match(v, uniqv)))]
}

shoe_sizes <- c(7, 8, 8, 9, 9, 9, 10, 11, 9, 8)
mode_val <- get_mode(shoe_sizes)
print(paste("The mode shoe size is:", mode_val))

## [1] "The mode shoe size is: 9"

5. Summary: When to Use Which?

Measure	Best Used For	Sensitivity to Outliers
Mean	Symmetric data, Normal distributions	Highly Sensitive
Median	Skewed data (e.g., Income, Prices)	Robust (Not affected)
Mode	Categorical data (e.g., Color, Size)	Robust

Visualizing Skewness

Symmetric (Normal): Mean $\approx$ Median $\approx$ Mode
Right Skewed (Positive): Mode < Median < Mean
Left Skewed (Negative): Mean < Median < Mode

End of Module I ```

Module I: Measures of Central Tendency

Department of Statistics / Mathematics

2025-12-28

1. Introduction

2. The Arithmetic Mean

Mathematical Formula

Real-Life Example

R Implementation

3. The Median

Mathematical Formula

Real-Life Example

R Implementation

4. The Mode

Mathematical Formula

Real-Life Example

R Implementation

5. Summary: When to Use Which?

Visualizing Skewness