1. Introduction

A Measure of Central Tendency is a summary statistic that represents the center point or typical value of a dataset. These measures indicate where most values in a distribution fall and are also known as the “location” of the data.

The three most common measures are: 1. Mean 2. Median 3. Mode


2. The Arithmetic Mean

The mean (or average) is the sum of all observations divided by the total number of observations.

Mathematical Formula

For a sample of size \(n\), the sample mean \(\bar{x}\) is calculated as: \[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}\]

Where: - \(\sum\): Sigma notation (summation) - \(x_i\): Each individual value in the dataset - \(n\): The total number of values

Real-Life Example

Scenario: A teacher wants to find the average score of 5 students in a mini-quiz. Scores: 70, 85, 80, 90, 75.

Calculation: \[\bar{x} = \frac{70 + 85 + 80 + 90 + 75}{5} = \frac{400}{5} = 80\]

R Code Example

scores <- c(70, 85, 80, 90, 75)
mean_score <- mean(scores)
print(paste("The Mean Score is:", mean_score))
## [1] "The Mean Score is: 80"

3. The Median

The median is the middle value of a dataset when it has been arranged in ascending or descending order. It splits the data into two equal halves.

Calculation Method

  1. Sort the data from smallest to largest.
  2. If \(n\) is odd, the median is the middle value at position \(\frac{n+1}{2}\).
  3. If \(n\) is even, the median is the average of the two middle values at positions \(\frac{n}{2}\) and \(\frac{n}{2} + 1\).

Real-Life Example

Scenario: Weekly salaries of 6 employees in a small startup. Salaries: $500, $600, $550, $3000, $700, $650.

Step 1: Sort: 500, 550, 600, 650, 700, 3000. Step 2: Identify Middle: Since \(n=6\) (even), we average the 3rd and 4th values. \[\text{Median} = \frac{600 + 650}{2} = 625\]

Note: The median is more “robust” than the mean because it isn’t heavily affected by the outlier ($3000).

R Code Example

salaries <- c(500, 600, 550, 3000, 700, 650)
median_salary <- median(salaries)
print(paste("The Median Salary is:", median_salary))
## [1] "The Median Salary is: 625"

4. The Mode

The mode is the value that appears most frequently in a dataset.

Real-Life Example

Scenario: A shoe store records the sizes of sneakers sold in an hour. Sizes: 8, 9, 9, 10, 11, 9, 8, 12, 9.

Calculation: The number “9” appears 4 times, which is more than any other size. Mode = 9.

R Code Example

R does not have a built-in function for mode, but we can find it using a table:

shoe_sizes <- c(8, 9, 9, 10, 11, 9, 8, 12, 9)
counts <- table(shoe_sizes)
mode_val <- names(counts)[counts == max(counts)]
print(paste("The Mode is:", mode_val))
## [1] "The Mode is: 9"

5. Comparing Mean, Median, and Mode

The choice of measure depends on the shape of the distribution and the scale of measurement.

Measure Best Used For… Sensitivity to Outliers
Mean Symmetric data (e.g., height, weight) Highly Sensitive
Median Skewed data (e.g., income, house prices) Robust (Not affected)
Mode Categorical data (e.g., favorite color) Not affected

Shape of Distribution

  • Symmetric: Mean \(\approx\) Median \(\approx\) Mode.
  • Right Skewed (Positive): Mean > Median > Mode.
  • Left Skewed (Negative): Mean < Median < Mode.

6. Summary Exercise

Imagine a small company with 5 staff members earning $40k, $42k, $45k, $48k, and $150k (the CEO).

  1. Calculate the Mean: \(\frac{40+42+45+48+150}{5} = 65k\).
  2. Calculate the Median: The middle value is \(45k\).
  3. Question: Which number better represents the “typical” worker’s salary?
    • Answer: The Median (45k), because the CEO’s high salary inflates the mean.

```

How to use this:

  1. Install RStudio.
  2. Create a new file: File > New File > R Markdown....
  3. Paste the code above into the editor.
  4. Click the Knit button to generate a clean, formatted document.