1. Introduction

A measure of central tendency is a summary statistic that represents the center point or typical value of a dataset. These measures indicate where most values in a distribution fall and are also referred to as the “location” of the data.

In this module, we will cover the three primary measures: 1. The Mean 2. The Median 3. The Mode

2. The Arithmetic Mean

The mean (or average) is the most common measure of central tendency. It is the sum of all observations divided by the total number of observations.

Mathematical Formula

For a sample of size $n$, the sample mean $\bar{x}$ is calculated as:

\[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} = \frac{x_1 + x_2 + \dots + x_n}{n}\]

Where: - $\sum$: Summation symbol - $x_i$: Each individual value - $n$: Total number of values

R Implementation

# Example dataset: Exam scores
scores <- c(85, 90, 78, 92, 71, 88, 95)

# Calculate mean
mean_score <- mean(scores)
print(paste("The mean score is:", round(mean_score, 2)))

## [1] "The mean score is: 85.57"

Real-Life Example

Corporate Salaries: If a small startup has 5 employees earning $40k, $45k, $50k, $52k, and $100k, the mean salary is $57,400. Note how the high salary of one individual pulls the mean upward.

3. The Median

The median is the middle value in a dataset when the values are arranged in ascending or descending order. It splits the data into two equal halves.

Mathematical Calculation

Order the data from smallest to largest.
If $n$ is odd: The median is the value at position $\frac{n+1}{2}$.
If $n$ is even: The median is the average of the two middle values at positions $\frac{n}{2}$ and $\frac{n}{2} + 1$.

R Implementation

# Dataset
heights <- c(160, 165, 170, 175, 180, 185)

# Calculate median
med_height <- median(heights)
print(paste("The median height is:", med_height))

## [1] "The median height is: 172.5"

Real-Life Example

Real Estate: When reporting “Median Home Prices,” economists prefer the median over the mean because it isn’t skewed by a few multi-million dollar mansions. It represents what a “typical” buyer might pay.

4. The Mode

The mode is the value that appears most frequently in a dataset. A dataset can have: - Unimodal: One mode - Bimodal: Two modes - Multimodal: More than two modes

R Implementation

Base R does not have a built-in function for the mode of a numeric vector, so we often use a custom function or the table() function.

# Example dataset: Shoe sizes
shoe_sizes <- c(7, 8, 8, 9, 10, 10, 10, 11, 12)

# Using table to find the frequency
freq_table <- table(shoe_sizes)
mode_value <- names(freq_table)[which.max(freq_table)]

print(paste("The mode shoe size is:", mode_value))

## [1] "The mode shoe size is: 10"

Real-Life Example

Inventory Management: A shoe store manager uses the mode to decide which size to stock the most. If size 10 is the mode, they will order more of that size than any other.

5. Comparison: Mean vs. Median vs. Mode

Measure	Best Used For	Sensitivity to Outliers
Mean	Continuous data with a symmetrical distribution	High (Strongly affected)
Median	Skewed data or data with outliers	Low (Robust)
Mode	Categorical (Nominal) data	Low

Visualizing the Relationship

In a skewed distribution: - Right Skew (Positive): Mean > Median > Mode - Left Skew (Negative): Mean < Median < Mode - Symmetric: Mean $\approx$ Median $\approx$ Mode

6. Weighted Mean (Special Case)

Sometimes, certain values in a dataset contribute more to the final average than others.

Formula

\[\bar{x}_w = \frac{\sum_{i=1}^{n} w_i x_i}{\sum_{i=1}^{n} w_i}\]

Where $w_i$ is the weight assigned to value $x_i$.

Example

A student’s final grade is based on: - Homework (20%): 95 - Midterm (30%): 80 - Final Exam (50%): 85

grades <- c(95, 80, 85)
weights <- c(0.20, 0.30, 0.50)

weighted_avg <- weighted.mean(grades, weights)
print(paste("The weighted final grade is:", weighted_avg))

## [1] "The weighted final grade is: 85.5"

Summary Exercises

Calculate the mean, median, and mode for the following dataset: c(10, 12, 12, 15, 20, 25, 100).
Which measure is the most appropriate for describing the “average” income in a country with high wealth inequality?
In a class of 30 students, 15 students scored 70, 10 students scored 80, and 5 students scored 95. Calculate the mean score. ```

Module I: Measures of Central Tendency

Statistics for Data Science

Instructor Name

2025-12-28

1. Introduction

2. The Arithmetic Mean

Mathematical Formula

R Implementation

Real-Life Example

3. The Median

Mathematical Calculation

R Implementation

Real-Life Example

4. The Mode

R Implementation

Real-Life Example

5. Comparison: Mean vs. Median vs. Mode

Visualizing the Relationship

6. Weighted Mean (Special Case)

Formula

Example

Summary Exercises