Measures of Central Tendency

What is Central Tendency?

Central tendency describes the center or typical value of a dataset.

The three most common measures are:

Each measure answers the question: “Where does the data tend to cluster?”

They are used in every professional field: economics, medicine, education, sports and entertainment, and more.

The population mean for \(N\) values \(x_1, x_2, \ldots, x_N\):

\[\mu = \frac{1}{N} \sum_{i=1}^{N} x_i\]

The sample mean for \(n\) values:

\[\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i = \frac{x_1 + x_2 + \cdots + x_n}{n}\]

Takes all values into account
Sensitive to outliers — one extreme value can pull the mean far from the center
Best used when data is mostly symmetric and has no extreme outliers

Median — the middle value of a sorted dataset.

For an odd number of values (\(n\) odd): \[\text{Median} = x_{\left(\frac{n+1}{2}\right)}\]

For an even number of values (\(n\) even): \[\text{Median} = \frac{x_{\left(\frac{n}{2}\right)} + x_{\left(\frac{n}{2}+1\right)}}{2}\]

Mode — the value(s) that appear most often.

Consider the dataset: \(\{3, 7, 7, 2, 9, 4, 7, 1, 5\}\)

Mean: \[\bar{x} = \frac{3+7+7+2+9+4+7+1+5}{9} = \frac{45}{9} = 5\]

Median — sort the data: \(\{1, 2, 3, 4, \mathbf{5}, 7, 7, 7, 9\}\)

With \(n = 9\) (odd), the median is the 5th value: \(\text{Median} = 5\)

Mode — 7 appears 3 times (more than any other value): \(\text{Mode} = 7\)

x <- c(3, 7, 7, 2, 9, 4, 7, 1, 5)

mean(x)    # Mean

## [1] 5

median(x)  # Median

## [1] 5

# R has no built-in mode(); we define one:
get_mode <- function(v) {
  unique(v)[which.max(tabulate(match(v, unique(v))))]
}
get_mode(x)  # Mode

## [1] 7

Professional salary data is classically right-skewed due to higher earners. We simulate salaries across a company:

Mean, Median, and approximate Mode of housing prices across four neighborhoods:

Situation	Best Measure	Reason
Symmetric data, no outliers	Mean	Uses all data values
Skewed data or outliers present	Median	Robust to extremes
Categorical data	Mode	Only option for non-numeric data
Reporting income / home prices	Median	Outliers distort the mean
Most common item in a store	Mode	Frequency-based question

Key Takeaways: