1. Introduction

A Measure of Central Tendency is a single value that identifies the central position within a dataset. It “summarizes” the data by finding a typical value around which the data points cluster.

In this module, we will cover the three most common measures: 1. The Arithmetic Mean 2. The Median 3. The Mode


2. The Arithmetic Mean

The Mean is the most common measure of central tendency, often referred to as the “average.”

2.1 Mathematical Formula

For a sample of \(n\) observations (\(x_1, x_2, \dots, x_n\)): \[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}\]

For a population of \(N\) observations: \[\mu = \frac{\sum_{i=1}^{N} X_i}{N}\]

2.2 R Example

# Dataset: Weekly hours spent studying by 8 students
study_hours <- c(12, 15, 14, 18, 20, 15, 10, 22)

# Calculating the mean using base R
mean_val <- mean(study_hours)
print(paste("The mean study hours are:", mean_val))
## [1] "The mean study hours are: 15.75"

2.3 Real-Life Example: Marketing

In Marketing, analysts use the mean to calculate “Average Revenue Per User” (ARPU). This helps businesses understand how much money they generate on average from each customer to decide their advertising budget.


3. The Median

The Median is the middle value when a dataset is arranged in order. It is unique because it is robust, meaning it is not heavily influenced by extreme outliers.

3.1 Mathematical Steps

  1. Sort the data from smallest to largest.
  2. If \(n\) is odd: The median is the value at position \(\frac{n+1}{2}\).
  3. If \(n\) is even: The median is the average of the two middle values at positions \(\frac{n}{2}\) and \(\frac{n}{2} + 1\).

3.2 R Example

# Dataset: Salaries of 5 employees (in thousands)
# Note the extreme outlier (250) representing the CEO
salaries <- c(45, 50, 60, 55, 250) 

# Compare Mean vs Median
cat("Mean Salary:", mean(salaries), "\n")
## Mean Salary: 92
cat("Median Salary:", median(salaries), "\n")
## Median Salary: 55

3.3 Real-Life Example: Real Estate

In Real Estate, the “Median House Price” is preferred over the mean. If a neighborhood has five houses priced at $200k and one mansion priced at $5 million, the mean would suggest the “average” house costs $1 million—which is misleading. The median stays near $200k, accurately reflecting what a typical homebuyer pays.


4. The Mode

The Mode is the value that appears most frequently in a dataset.

4.1 Characteristics

  • A dataset can be Unimodal (one mode), Bimodal (two modes), or Multimodal.
  • It is the only measure that can be used for Categorical data (like colors or brands).

4.2 R Example

R does not have a standard built-in mode() function for statistics (the function mode() in R actually checks the data type). We use the table() function instead.

# Dataset: Popular car colors sold
colors <- c("Red", "Blue", "Blue", "White", "Red", "Blue", "Silver")

# Create a frequency table and find the max
color_table <- table(colors)
modal_value <- names(color_table)[which.max(color_table)]

print(color_table)
## colors
##   Blue    Red Silver  White 
##      3      2      1      1
print(paste("The mode is:", modal_value))
## [1] "The mode is: Blue"

4.3 Real-Life Example: Inventory Management

A Retail Manager uses the mode to determine which shoe size to stock. Knowing the “average” shoe size is 8.4 is not helpful for ordering, but knowing that size 9 is the Mode (most frequently sold) tells the manager exactly what to buy.


5. Comparing the Measures

Feature Mean Median Mode
Best Used For Continuous data (No outliers) Skewed data / Outliers Categorical data
Sensitive to Outliers? Yes No No
Formula Sum / Count Middle Value Most Frequent

5.1 Visualization of Skewness

When a distribution is “skewed” (stretched to one side), these three measures separate. - Symmetric: Mean = Median = Mode - Right Skewed: Mean > Median > Mode


6. Conclusion

In summary, Module I introduces how to find the “center” of data. Choosing between Mean, Median, and Mode depends entirely on the type of data you have and whether your data has outliers.


End of Module I Notes ```