1. Introduction

In statistics, a measure of central tendency is a summary measure that attempts to describe a whole set of data with a single value that represents the middle or center of its distribution.

It is often referred to as a “summary statistic.” The three most common measures are: 1. The Mean 2. The Median 3. The Mode

2. The Arithmetic Mean

The mean (or average) is the most popular and well-known measure of central tendency. It is the sum of all values divided by the number of values.

Mathematical Formula

For a sample of size $n$, the sample mean $\bar{x}$ is calculated as:

\[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}\]

Where: * $\sum$: The summation symbol * $x_i$: Each individual value in the data set * $n$: The total number of observations

Real-Life Example

Scenario: A teacher wants to find the average score of 5 students in a mini-quiz. The scores are: 85, 90, 70, 75, and 95.

R Calculation

# Define the data
scores <- c(85, 90, 70, 75, 95)

# Calculate mean
avg_score <- mean(scores)
print(paste("The average score is:", avg_score))

## [1] "The average score is: 83"

Note: The mean is highly sensitive to outliers (extreme values).

3. The Median

The median is the middle value in a data set when the values are arranged in ascending or descending order.

Mathematical Definition

To find the median: 1. Arrange the data in order. 2. If $n$ is odd, the median is the value at position: $\frac{n+1}{2}$ 3. If $n$ is even, the median is the average of the values at positions: $\frac{n}{2}$ and $\frac{n}{2} + 1$

Real-Life Example

Scenario: Monthly household incomes in a small neighborhood: $2000, $2500, $3000, $3500, and $1,000,000 (an outlier).

The mean would be heavily skewed by the millionaire, but the median provides a better sense of what a “typical” neighbor earns.

R Calculation

incomes <- c(2000, 2500, 3000, 3500, 1000000)

# Calculate Median
med_income <- median(incomes)
avg_income <- mean(incomes)

print(paste("Median Income:", med_income))

## [1] "Median Income: 3000"

print(paste("Mean Income:", round(avg_income, 2)))

## [1] "Mean Income: 202200"

4. The Mode

The mode is the value that appears most frequently in a data set. A data set can have one mode (unimodal), two modes (bimodal), or multiple modes (multimodal).

Real-Life Example

Scenario: A shoe store tracks the sizes of sneakers sold in one hour: 8, 9, 9, 10, 11, 9, 8. The mode is 9 because it occurs most frequently.

R Calculation

Note: R does not have a built-in standard function for mode, so we create a simple custom function or use a table.

# Sample data
shoe_sizes <- c(8, 9, 9, 10, 11, 9, 8)

# Calculate mode using a frequency table
mode_val <- names(sort(table(shoe_sizes), decreasing = TRUE))[1]
print(paste("The most popular shoe size is:", mode_val))

## [1] "The most popular shoe size is: 9"

5. Comparing the Measures

When should you use which measure?

Measure	Best Used For…	Sensitivity to Outliers
Mean	Continuous data with a symmetrical distribution	High (Very sensitive)
Median	Skewed data or data with outliers (e.g., Salaries)	Low (Robust)
Mode	Categorical/Nominal data (e.g., Favorite color)	Low

Visualization: The Impact of Outliers

The code below generates a skewed distribution and shows where the Mean and Median fall.

library(ggplot2)

# Create a skewed dataset
data <- data.frame(val = c(rbeta(1000, 2, 8) * 100))

# Calculate stats
mu <- mean(data$val)
med <- median(data$val)

# Plot
ggplot(data, aes(x=val)) +
  geom_histogram(fill="skyblue", color="white", bins=30) +
  geom_vline(aes(xintercept=mu, color="Mean"), size=1) +
  geom_vline(aes(xintercept=med, color="Median"), size=1) +
  labs(title="Mean vs Median in Right-Skewed Data",
       x="Value", y="Frequency") +
  scale_color_manual(name = "Measures", values = c(Mean = "red", Median = "blue")) +
  theme_minimal()

6. Exercises

Dataset: 12, 15, 12, 18, 20, 100. Calculate the mean and median. Which one represents the “center” better?
R Task: Create a vector of 20 random numbers using rnorm(20) and calculate all three measures of central tendency.

End of Module I ```

Key features of this lecture note:

LaTeX Integration: It uses $...$ for inline math and $$...$$ for centered formulas, which is standard for academic notes.
R Code Chunks: It provides actual executable code blocks to demonstrate how to calculate these measures using R.
Visualization: It includes a ggplot2 block to visualize the difference between mean and median in skewed data—a crucial concept for students.
Markdown Table: It uses a table to compare the three measures for quick revision.
Professional Formatting: Uses the cosmo theme and a floating table of contents for easy navigation.

Module I: Measures of Central Tendency

Hamda Abdisamad Eggeh

2025-12-28

1. Introduction

2. The Arithmetic Mean

Mathematical Formula

Real-Life Example

R Calculation

3. The Median

Mathematical Definition

Real-Life Example

R Calculation

4. The Mode

Real-Life Example

R Calculation

5. Comparing the Measures

Visualization: The Impact of Outliers

6. Exercises

Key features of this lecture note: