1. Introduction

In statistics, a measure of central tendency is a summary measure that attempts to describe a whole set of data with a single value that represents the middle or center of its distribution.

Think of it as a “typical” value for the data. The three most common measures are: 1. Mean 2. Median 3. Mode


2. The Arithmetic Mean

The mean is the most common measure of central tendency. It is the “average” we calculate by summing all observations and dividing by the total count.

Mathematical Formula

For a sample of size \(n\), the mean (\(\bar{x}\)) is: \[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}\]

For a population of size \(N\), the mean (\(\mu\)) is: \[\mu = \frac{\sum_{i=1}^{N} x_i}{N}\]

Real-Life Example

Context: A teacher wants to find the average score of 5 students in a mini-quiz. Scores: 85, 90, 70, 75, 95.

Calculation: \[\bar{x} = \frac{85 + 90 + 70 + 75 + 95}{5} = \frac{415}{5} = 83\]

Calculation in R

scores <- c(85, 90, 70, 75, 95)
mean_score <- mean(scores)
print(paste("The mean score is:", mean_score))
## [1] "The mean score is: 83"

3. The Median

The median is the middle value in a data set when the values are arranged in ascending or descending order. It splits the data into two equal halves.

Mathematical Procedure

  1. Arrange the data in order.
  2. If \(n\) is odd, the median is the value at position: \(\frac{n+1}{2}\).
  3. If \(n\) is even, the median is the average of the two middle values at positions: \(\frac{n}{2}\) and \(\frac{n}{2} + 1\).

Real-Life Example

Context: Household incomes in a small neighborhood. Incomes (in thousands): $45, $50, $52, $55, $250.

Note how the $250k income is an outlier. * Ordered Data: 45, 50, 52, 55, 250 * Median: 52 (The middle value)

Unlike the mean (which would be 90.4), the median is not affected by the outlier ($250), making it a better measure for skewed data.

Calculation in R

incomes <- c(45, 50, 52, 55, 250)
median_income <- median(incomes)
print(paste("The median income is:", median_income))
## [1] "The median income is: 52"

4. The Mode

The mode is the value that appears most frequently in a data set. A set can have one mode (unimodal), two modes (bimodal), or many modes (multimodal).

Real-Life Example

Context: A shoe store tracks the sizes sold in one hour to determine what to restock. Sizes sold: 7, 8, 8, 9, 10, 10, 10, 11.

Mode: 10 (It appears 3 times).

Calculation in R

Note: Base R does not have a built-in function for the statistical mode, so we create a simple function.

get_mode <- function(v) {
   uniqv <- unique(v)
   uniqv[which.max(tabulate(match(v, uniqv)))]
}

shoe_sizes <- c(7, 8, 8, 9, 10, 10, 10, 11)
mode_size <- get_mode(shoe_sizes)
print(paste("The mode shoe size is:", mode_size))
## [1] "The mode shoe size is: 10"

5. Summary: Which Measure to Use?

Measure Best Used For… Sensitivity to Outliers
Mean Symmetric data, Normal distributions High (Highly affected)
Median Skewed data (e.g., Income, House Prices) Low (Robust)
Mode Categorical data (e.g., Favorite color) Low

Visualizing Central Tendency in R

library(ggplot2)

# Create a skewed dataset
data <- data.frame(val = c(rnorm(100, 50, 10), 150, 160, 170))

ggplot(data, aes(x = val)) +
  geom_histogram(fill = "skyblue", color = "white", bins = 30) +
  geom_vline(aes(xintercept = mean(val), color = "Mean"), size = 1) +
  geom_vline(aes(xintercept = median(val), color = "Median"), size = 1) +
  scale_color_manual(name = "Statistics", values = c(Mean = "red", Median = "blue")) +
  labs(title = "Mean vs. Median in Skewed Data",
       x = "Value", y = "Frequency") +
  theme_minimal()


6. Exercises

  1. Dataset: 12, 15, 12, 18, 20, 100. Calculate the mean and median. Which is more representative of the “center”?
  2. Conceptual: If a dataset is perfectly symmetrical, what is the relationship between the mean, median, and mode? ```

Key Features of this Template:

  1. LaTeX Formulas: Uses $$ for clean mathematical representation.
  2. R Code Chunks: Includes functional R code (mean(), median(), and a custom get_mode) so students can see how to apply the math computationally.
  3. Visualizations: Includes a ggplot2 chart to demonstrate how outliers pull the mean away from the median.
  4. Formatting: Uses Markdown tables and headers for a professional lecture note structure.