1. Introduction

In statistics, a measure of central tendency is a summary measure that attempts to describe a whole set of data with a single value that represents the middle or center of its distribution.

It is often referred to as the “typical” value of the dataset. The three most common measures are: 1. Mean 2. Median 3. Mode

2. The Arithmetic Mean

The mean (or average) is the most popular and well-known measure of central tendency. It is calculated by summing all the values in a dataset and dividing by the total number of values.

Mathematical Formula

For a sample of size $n$, the sample mean $\bar{x}$ is given by:

\[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}\]

Where: * $\bar{x}$ = Sample mean * $x_i$ = The $i^{th}$ value in the dataset * $n$ = Number of observations

Real-Life Example

Scenario: A teacher wants to find the average score of 5 students in a math quiz. Scores: 70, 85, 80, 90, 75.

Calculation: \[\bar{x} = \frac{70 + 85 + 80 + 90 + 75}{5} = \frac{400}{5} = 80\]

R Implementation

scores <- c(70, 85, 80, 90, 75)
mean_score <- mean(scores)
print(paste("The Mean Score is:", mean_score))

## [1] "The Mean Score is: 80"

3. The Median

The median is the middle value in a distribution when the values are arranged in ascending or descending order. It splits the data into two equal halves.

Mathematical Determination

Arrange data from smallest to largest.
If $n$ is odd: The median is the middle value at position $\frac{n+1}{2}$.
If $n$ is even: The median is the average of the two middle values at positions $\frac{n}{2}$ and $\frac{n}{2} + 1$.

Real-Life Example

Scenario: Monthly salaries of 5 employees: $2,000, $2,500, $3,000, $3,500, $10,000. (Notice that $10,000 is an outlier).

Calculation: The values are already sorted. Since $n=5$ (odd), the middle value is the 3rd one. Median = $3,000.

Note: The mean would be $4,200, which doesn’t represent the “typical” employee well because of the outlier. This is why the Median is preferred for skewed data.

R Implementation

salaries <- c(2000, 2500, 3000, 3500, 10000)
median_salary <- median(salaries)
print(paste("The Median Salary is:", median_salary))

## [1] "The Median Salary is: 3000"

4. The Mode

The mode is the value that appears most frequently in a dataset. A dataset can have one mode (unimodal), two modes (bimodal), or several modes (multimodal).

Mathematical Formula

There is no specific algebraic formula for the mode; it is identified by frequency counting: \[Mode = \text{Value with highest frequency}\]

Real-Life Example

Scenario: A shoe store tracks the sizes sold in one hour: 7, 8, 8, 9, 10, 8, 11.

Calculation: * 7: 1 time * 8: 3 times * 9: 1 time * 10: 1 time * 11: 1 time

Mode = 8.

R Implementation

Note: R does not have a standard built-in function for the statistical mode, so we create a simple function.

get_mode <- function(v) {
   uniqv <- unique(v)
   uniqv[which.max(tabulate(match(v, uniqv)))]
}

shoe_sizes <- c(7, 8, 8, 9, 10, 8, 11)
mode_size <- get_mode(shoe_sizes)
print(paste("The Mode Shoe Size is:", mode_size))

## [1] "The Mode Shoe Size is: 8"

5. Comparison: When to use which?

Measure	Best Used For	Sensitivity to Outliers
Mean	Continuous data, symmetric distribution	High (Outliers pull the mean)
Median	Skewed data (e.g., income, house prices)	Low (Robust to outliers)
Mode	Categorical data (e.g., most popular car color)	Low

Visualizing the Relationship

In a normal distribution, Mean = Median = Mode. In a right-skewed distribution (positive skew): Mean > Median > Mode. In a left-skewed distribution (negative skew): Mean < Median < Mode.

6. Summary Exercises

Calculate the mean of the following dataset: 12, 15, 18, 22, 30.
Identify the median of: 3, 10, 2, 8, 15, 12. (Remember to sort first!)
Find the mode of the following colors: Red, Blue, Red, Green, Red, Blue.

End of Module I Notes ```

Key Components included in this draft:

YAML Header: Standard configuration for R Markdown.
LaTeX Equations: Used $$ for professional mathematical rendering of the mean and median logic.
R Code Chunks: Included mean(), median(), and a custom get_mode() function to show students how to perform these calculations computationally.
Formatting: Used tables and bold text to highlight comparisons and definitions.
Context: Added “Real-Life Examples” for each measure to improve conceptual understanding (e.g., why the median is better for salaries).

Module I: Measures of Central Tendency

Introduction to Statistical Data Analysis

Abdikarim Ali Abdi

2025-12-28

1. Introduction

2. The Arithmetic Mean

Mathematical Formula

Real-Life Example

R Implementation

3. The Median

Mathematical Determination

Real-Life Example

R Implementation

4. The Mode

Mathematical Formula

Real-Life Example

R Implementation

5. Comparison: When to use which?

Visualizing the Relationship

6. Summary Exercises

Key Components included in this draft: