1. Introduction

A Measure of Central Tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data. In statistics, the three most common measures are the Mean, Median, and Mode.

2. The Arithmetic Mean

The mean (or average) is the sum of all observations divided by the total number of observations.

Mathematical Formula

For a sample of size $n$, the sample mean $\bar{x}$ is: \[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} = \frac{x_1 + x_2 + ... + x_n}{n}\]

Where: * $\sum$ is the summation symbol. * $x_i$ represents each individual value. * $n$ is the total number of values in the sample.

Real-Life Example

Scenario: A tech startup tracks the number of hours five employees worked in a day: 8, 9, 7, 10, and 6 hours.

R Implementation

# Data vector
work_hours <- c(8, 9, 7, 10, 6)

# Calculating Mean
mean_val <- mean(work_hours)
print(paste("The mean work hours are:", mean_val))

## [1] "The mean work hours are: 8"

3. The Median

The median is the middle value in a data set when the numbers are arranged in ascending or descending order. It is robust to outliers.

Mathematical Formula

Arrange data in order from smallest to largest.
If $n$ is odd, the median is the value at position: $\frac{n+1}{2}$
If $n$ is even, the median is the average of the values at positions: $\frac{n}{2}$ and $\frac{n}{2} + 1$

Real-Life Example

Scenario: Monthly house rents in a neighborhood: $1200, $1250, $1300, $1400, and $5000 (an outlier).

R Implementation

rents <- c(1200, 1250, 1300, 1400, 5000)

# Mean vs Median comparison
print(paste("Mean Rent:", mean(rents)))

## [1] "Mean Rent: 2030"

print(paste("Median Rent:", median(rents)))

## [1] "Median Rent: 1300"

Observation: Notice how the mean is pulled upward by the $5000 rent, while the median stays representative of the “typical” house.

4. The Mode

The mode is the value that appears most frequently in a data set. A set can be unimodal (one mode), bimodal (two modes), or multimodal.

Real-Life Example

Scenario: A shoe store records the sizes of sneakers sold in an hour: 7, 8, 8, 9, 10, 8, 11.

R Implementation

Note: R does not have a standard built-in function for the statistical mode, so we use the table function or a custom function.

shoe_sizes <- c(7, 8, 8, 9, 10, 8, 11)

# Using table to find frequency
freq_table <- table(shoe_sizes)
mode_val <- names(freq_table)[which.max(freq_table)]

print(freq_table)

## shoe_sizes
##  7  8  9 10 11 
##  1  3  1  1  1

print(paste("The Mode shoe size is:", mode_val))

## [1] "The Mode shoe size is: 8"

5. Comparison: When to use what?

Measure	Best Used For…	Sensitivity to Outliers
Mean	Continuous data with a symmetric distribution (e.g., Height).	Highly Sensitive
Median	Skewed data or data with outliers (e.g., Income).	Resistant
Mode	Categorical/Nominal data (e.g., Favorite color).	Resistant

6. Visualizing Central Tendency

Let’s visualize where these measures sit on a distribution using a generated dataset of exam scores.

# Generate random data
set.seed(123)
scores <- rgamma(100, shape = 2, scale = 10) # Right-skewed distribution

m_mean <- mean(scores)
m_median <- median(scores)

# Plotting
hist(scores, col="lightblue", main="Distribution of Exam Scores", xlab="Score")
abline(v = m_mean, col = "red", lwd = 2, lty = 1)
abline(v = m_median, col = "blue", lwd = 2, lty = 2)

legend("topright", legend=c("Mean", "Median"), 
       col=c("red", "blue"), lwd=2, lty=1:2)

7. Summary Checklist

Use Mean for normal distributions.
Use Median when your data has extreme values (outliers).
Use Mode for non-numerical (categorical) data.
In R, mean() and median() are built-in, but table() is best for finding the mode. ```

Key Features of this Template:

LaTeX Integration: Uses $...$ for inline math and $$...$$ for block equations.
Code Chunks: Includes executable R code that generates results and plots.
Visual Aids: Includes a histogram to demonstrate the difference between mean and median in a skewed distribution.
Formatting: Uses Markdown tables and headers for readability.

How to use this:

Install R and RStudio.
Install the rmarkdown package: install.packages("rmarkdown").
Create a new RMarkdown file and paste the code above.
Click Knit at the top of the editor.

Module I: Measures of Central Tendency

Statistical Methods for Data Analysis

Zakariye abdillahi

2025-12-28

1. Introduction

2. The Arithmetic Mean

Mathematical Formula

Real-Life Example

R Implementation

3. The Median

Mathematical Formula

Real-Life Example

R Implementation

4. The Mode

Real-Life Example

R Implementation

5. Comparison: When to use what?

6. Visualizing Central Tendency

7. Summary Checklist

Key Features of this Template:

How to use this: