What is Central Tendency?

Central tendency describes the center or typical value of a dataset.

The three most common measures are:

  • Mean — the arithmetic average of all values
  • Median — the middle value when data is sorted
  • Mode — the most frequently occurring value

Each measure answers the question: “Where does the data tend to cluster?”

They are used in every professional field: economics, medicine, education, sports and entertainment, and more.

The Mean

The population mean for \(N\) values \(x_1, x_2, \ldots, x_N\):

\[\mu = \frac{1}{N} \sum_{i=1}^{N} x_i\]

The sample mean for \(n\) values:

\[\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i = \frac{x_1 + x_2 + \cdots + x_n}{n}\]

  • Takes all values into account
  • Sensitive to outliers — one extreme value can pull the mean far from the center
  • Best used when data is mostly symmetric and has no extreme outliers

The Median and Mode

Median — the middle value of a sorted dataset.

For an odd number of values (\(n\) odd): \[\text{Median} = x_{\left(\frac{n+1}{2}\right)}\]

For an even number of values (\(n\) even): \[\text{Median} = \frac{x_{\left(\frac{n}{2}\right)} + x_{\left(\frac{n}{2}+1\right)}}{2}\]

  • Robust to outliers — not affected by extreme values
  • Best used for skewed distributions or data with outliers

Mode — the value(s) that appear most often.

  • Can be used for categorical data (e.g., most common eye color)
  • A dataset can have no mode, one mode, or multiple modes

Step-by-Step Example

Consider the dataset: \(\{3, 7, 7, 2, 9, 4, 7, 1, 5\}\)

Mean: \[\bar{x} = \frac{3+7+7+2+9+4+7+1+5}{9} = \frac{45}{9} = 5\]

Median — sort the data: \(\{1, 2, 3, 4, \mathbf{5}, 7, 7, 7, 9\}\)

With \(n = 9\) (odd), the median is the 5th value: \(\text{Median} = 5\)

Mode — 7 appears 3 times (more than any other value): \(\text{Mode} = 7\)

R Code to Compute Central Tendency

x <- c(3, 7, 7, 2, 9, 4, 7, 1, 5)

mean(x)    # Mean
## [1] 5
median(x)  # Median
## [1] 5
# R has no built-in mode(); we define one:
get_mode <- function(v) {
  unique(v)[which.max(tabulate(match(v, unique(v))))]
}
get_mode(x)  # Mode
## [1] 7

Effect of Outliers on Each Measure

Mean, Median & Skewness

Real-World Example: Professional Salaries

Professional salary data is classically right-skewed due to higher earners. We simulate salaries across a company:

3D View: Central Tendency by Neighborhood (Plotly)

Mean, Median, and approximate Mode of housing prices across four neighborhoods:

When to Use Which Measure?

Situation Best Measure Reason
Symmetric data, no outliers Mean Uses all data values
Skewed data or outliers present Median Robust to extremes
Categorical data Mode Only option for non-numeric data
Reporting income / home prices Median Outliers distort the mean
Most common item in a store Mode Frequency-based question

Key Takeaways:

  • The mean is powerful but sensitive to outliers
  • The median is resistant to skew and extreme values
  • The mode works for any data type, including categorical
  • In a symmetric distribution: Mean \(\approx\) Median \(\approx\) Mode
  • In a right-skewed distribution: Mode \(<\) Median \(<\) Mean