1. Introduction

A Measure of Central Tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data. In statistics, it is often referred to as a “summary statistic.”

The three most common measures are: 1. Mean 2. Median 3. Mode


2. The Arithmetic Mean

The mean is the sum of all values divided by the total number of values. It is the most common measure of central tendency.

Mathematical Formula

For a sample, the mean (\(\bar{x}\)) is: \[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}\]

R Example

# Define the dataset
data <- c(5, 8, 12, 15, 10)

# Calculate mean
sample_mean <- mean(data)
print(paste("The mean is:", sample_mean))
## [1] "The mean is: 10"

Real-Life Application: Academic Grading: A teacher calculates the average score of a class exam to determine the overall performance level.


3. The Median

The median is the middle value in a dataset when the values are arranged in order.

Calculation Logic

  1. Arrange data in ascending order.
  2. If \(n\) is odd, the median is the middle number.
  3. If \(n\) is even, the median is the average of the two middle numbers.

R Example

# Dataset with an odd number of values
data_odd <- c(3, 10, 2, 8, 15)
print(paste("Median (Odd):", median(data_odd)))
## [1] "Median (Odd): 8"
# Dataset with an even number of values
data_even <- c(3, 10, 2, 8, 15, 20)
print(paste("Median (Even):", median(data_even)))
## [1] "Median (Even): 9"

Real-Life Application: Real Estate: Median home prices are used to avoid the distortion caused by a few extremely expensive mansions (outliers).


4. The Mode

The mode is the value that appears most frequently.

Note: R does not have a built-in function for the statistical mode. We can create a custom function to find it.

R Example

# Custom function to calculate mode
get_mode <- function(v) {
  uniqv <- unique(v)
  uniqv[which.max(tabulate(match(v, uniqv)))]
}

# Dataset
data_mode <- c(2, 4, 4, 6, 8, 9)
print(paste("The mode is:", get_mode(data_mode)))
## [1] "The mode is: 4"

Real-Life Application: Inventory: A shoe store tracks the “mode” of shoe sizes sold to know which size to restock most often.


5. Comparison of Measures

Data Type Best Measure Why?
Nominal (Categories) Mode You cannot average “Colors” or “Names.”
Ordinal (Ranked) Median Distance between ranks is not equal.
Interval/Ratio (No Outliers) Mean Most precise; uses all data.
Interval/Ratio (With Outliers) Median Not affected by extreme values.

6. Practice Exercise: The Impact of Outliers

A startup has 5 employees with salaries: $40k, $42k, $45k, $48k, and $250k (CEO).

salaries <- c(40, 42, 45, 48, 250)

# Calculate both
mean_sal <- mean(salaries)
med_sal <- median(salaries)

cat("Mean Salary: $", mean_sal, "k\n")
## Mean Salary: $ 85 k
cat("Median Salary: $", med_sal, "k\n")
## Median Salary: $ 45 k

Conclusion: The Mean ($85k) is misleading because of the CEO’s high salary. The Median ($45k) better represents the “typical” employee’s earnings. ```