1. Introduction

A Measure of Central Tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data. In statistics, the three most common measures of central tendency are the Mean, Median, and Mode.

2. The Arithmetic Mean

The mean (or average) is the most popular and well-known measure of central tendency. It is the sum of all values divided by the number of values.

Mathematical Formula

For a sample of size $n$, the sample mean $\bar{x}$ is calculated as:

\[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}\]

Where: - $\sum$: The symbol for summation. - $x_i$: Each individual value in the data set. - $n$: The total number of observations.

R Example

# Sample data: Exam scores of 5 students
scores <- c(85, 90, 88, 76, 92)

# Calculate Mean
mean_score <- mean(scores)
print(paste("The mean score is:", mean_score))

## [1] "The mean score is: 86.2"

Real-Life Example

Income Analysis: A company uses the mean to calculate the average salary of its employees to set budget expectations for the following year. However, the mean can be heavily influenced by “outliers” (e.g., a CEO earning 100x more than staff).

3. The Median

The median is the middle score for a set of data that has been arranged in order of magnitude. It is less affected by outliers and skewed data.

Mathematical Formula

Arrange data from smallest to largest.
If $n$ is odd, the median is the middle value at position: \[\text{Median} = \left( \frac{n+1}{2} \right)^{th} \text{term}\]
If $n$ is even, the median is the average of the two middle values: \[\text{Median} = \frac{(\frac{n}{2})^{th} \text{term} + (\frac{n}{2} + 1)^{th} \text{term}}{2}\]

R Example

# Data with an outlier
salaries <- c(30000, 32000, 31000, 35000, 500000) # One very high salary

# Mean vs Median
print(paste("Mean Salary:", mean(salaries)))

## [1] "Mean Salary: 125600"

print(paste("Median Salary:", median(salaries)))

## [1] "Median Salary: 32000"

Real-Life Example

Real Estate: When reporting house prices in a city, the median is preferred. If one mansion sells for $10 million and five small houses sell for $200,000, the mean would suggest the average house is expensive, while the median reflects what a typical buyer can actually afford.

4. The Mode

The mode is the value that appears most frequently in a data set. A data set can have one mode, more than one mode (bimodal/multimodal), or no mode at all.

R Example

R does not have a built-in function for mode, so we often use a custom function or a table.

# Sample data: Shirt sizes sold
sizes <- c("S", "M", "L", "M", "S", "M", "XL", "M")

# Calculate Mode using a frequency table
mode_table <- table(sizes)
print(mode_table)

## sizes
##  L  M  S XL 
##  1  4  2  1

# Finding the name of the max frequency
mode_val <- names(mode_table)[which.max(mode_table)]
print(paste("The modal size is:", mode_val))

## [1] "The modal size is: M"

Real-Life Example

Inventory Management: A shoe store manager looks at the mode of shoe sizes sold to decide which size to stock most heavily. Knowing the “average” shoe size (e.g., 8.4) is useless, but knowing that size 9 is the mode is actionable.

5. Choosing the Right Measure

Measure	Best Used For…	Sensitivity to Outliers
Mean	Numerical data with a normal distribution.	High (Sensitive)
Median	Numerical data with extreme values (outliers).	Low (Robust)
Mode	Categorical (Nominal) data.	Low

Distribution Shapes

Symmetric: Mean $\approx$ Median $\approx$ Mode.
Right Skewed (Positive): Mean > Median > Mode.
Left Skewed (Negative): Mean < Median < Mode.

6. Practice Exercise

Create a vector in R named my_data with the following values: 10, 15, 15, 17, 18, 21, 90.
Calculate the mean and median.
Which measure best represents the “center” of this data? Why?

my_data <- c(10, 15, 15, 17, 18, 21, 90)
# Add your code here

```

Key Features Included:

Mathematical Notation: Uses LaTeX for clean formulas (Summation, Mean, Median positions).
R Code Chunks: Provides executable code for calculating mean, median, and finding the mode (using the table method).
Real-Life Context: Explains why we use specific measures (e.g., using Median for house prices to avoid outlier distortion).
Comparison Table: A quick reference guide for students to choose the correct measure based on data type.
Visual Structure: Uses Markdown headers and bullet points for readability.

Module I: Measures of Central Tendency

Introduction to Statistical Data Analysis

Naima Abdi Ahmed

2025-12-28

1. Introduction

2. The Arithmetic Mean

Mathematical Formula

R Example

Real-Life Example

3. The Median

Mathematical Formula

R Example

Real-Life Example

4. The Mode

R Example

Real-Life Example

5. Choosing the Right Measure

Distribution Shapes

6. Practice Exercise

Key Features Included: