1. Introduction
2. The Arithmetic Mean
3. The Median
4. The Mode
5. Comparing Mean, Median, and Mode
- 5.1 Distribution Shape
6. R Implementation
7. Summary Exercise
- Key features of this RMarkdown file:

1. Introduction

In statistics, a measure of central tendency is a summary measure that attempts to describe a whole set of data with a single value that represents the middle or center of its distribution.

It is often referred to as a “summary statistic.” The three most common measures are: 1. Mean 2. Median 3. Mode

2. The Arithmetic Mean

The mean (or average) is the most popular and well-known measure of central tendency. It is calculated by summing all the values in a data set and dividing by the number of values.

2.1 Mathematical Formula

For a sample of size $n$, the sample mean ($\bar{x}$) is calculated as: \[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}\]

For a population of size $N$, the population mean ($\mu$) is: \[\mu = \frac{\sum_{i=1}^{N} X_i}{N}\]

Where: * $\sum$ = Summation symbol * $x_i$ = Each individual value * $n$ or $N$ = Total number of observations

2.2 Example

Data: A student’s test scores are 85, 90, 78, 92, and 88.

Calculation: \[\bar{x} = \frac{85 + 90 + 78 + 92 + 88}{5} = \frac{433}{5} = 86.6\]

2.3 Real-Life Application

Economics: Calculating the average income per capita in a country.
Meteorology: Determining the average daily temperature over a month.

2.4 Pros and Cons

Pros: Uses every value in the dataset; mathematically stable.
Cons: Highly sensitive to outliers (extreme values).

3. The Median

The median is the middle value in a data set that has been arranged in ascending or descending order. It splits the data into two equal halves.

3.1 Mathematical Calculation

Step 1: Sort the data: $x_{(1)} \leq x_{(2)} \leq \dots \leq x_{(n)}$
Step 2:
- If $n$ is odd, the median is the value at position: $\frac{n+1}{2}$
- If $n$ is even, the median is the average of the values at positions $\frac{n}{2}$ and $\frac{n}{2} + 1$.

3.2 Example

Case A (Odd): Scores: 10, 15, 20. * Sorted: 10, 15, 20. Median = 15.

Case B (Even): Scores: 10, 15, 20, 100. * Sorted: 10, 15, 20, 100. * Calculation: $\frac{15 + 20}{2} = 17.5$.

3.3 Real-Life Application

Real Estate: Median house prices are used instead of the mean because a few multi-million dollar mansions would skew the average, making houses seem more expensive than they typically are.

4. The Mode

The mode is the value that appears most frequently in a data set.

4.1 Characteristics

A dataset can be unimodal (one mode), bimodal (two modes), or multimodal (three or more).
If no number repeats, the dataset has no mode.

4.2 Example

Data: 2, 4, 4, 6, 7, 8, 4, 10 * Mode: 4 (it appears three times).

4.3 Real-Life Application

Inventory Management: A shoe store manager needs to know the “modal” shoe size (the most common size) to ensure they have enough stock for the majority of customers.

5. Comparing Mean, Median, and Mode

Feature	Mean	Median	Mode
Data Type	Quantitative	Quantitative / Ordinal	All types
Sensitivity to Outliers	High	Low	Low
Use Case	Symmetric data	Skewed data	Categorical data

5.1 Distribution Shape

Symmetric (Normal): Mean $\approx$ Median $\approx$ Mode.
Right Skewed (Positive): Mode < Median < Mean.
Left Skewed (Negative): Mean < Median < Mode.

6. R Implementation

Below is how we calculate these measures using R.

# Create a sample dataset of employee salaries (in thousands)
salaries <- c(45, 50, 52, 55, 58, 60, 62, 150) # 150 is an outlier

# Calculate Mean
mean_val <- mean(salaries)

# Calculate Median
med_val <- median(salaries)

# Calculate Mode (R does not have a built-in mode function for scalars)
get_mode <- function(v) {
  uniqv <- unique(v)
  uniqv[which.max(tabulate(match(v, uniqv)))]
}
mode_val <- get_mode(salaries)

# Display Results
cat("Mean Salary:", mean_val, "\n")

## Mean Salary: 66.5

cat("Median Salary:", med_val, "\n")

## Median Salary: 56.5

cat("Mode Salary:", mode_val, "\n")

## Mode Salary: 45

Observation: Notice how the mean ($66.5k) is higher than the median ($56.5k) because of the $150k outlier. In this case, the median is a better representation of the “typical” salary.

7. Summary Exercise

Imagine you are analyzing the number of goals scored by a football team in 10 matches: 0, 1, 1, 2, 2, 2, 3, 3, 4, 12

Calculate the Mean.
Calculate the Median.
Identify the Mode.
Which measure is most misleading due to the 12-goal outlier? ```

Key features of this RMarkdown file:

YAML Header: Sets the title, table of contents (TOC), and theme.
LaTeX Math: Uses $...$ for inline and $$...$$ for block equations.
Markdown Formatting: Uses headers, bold text, and tables for readability.
R Code Chunk: Includes a practical example of how to compute these values in R, including a custom function for the mode.
Real-Life Context: Explains why we use specific measures (e.g., house prices for the median).

Module I: Measures of Central Tendency

Foundations of Statistical Analysis

Abdikani Mohamed Aden

2026-01-02