1. Introduction

In statistics, a measure of central tendency is a summary measure that attempts to describe a whole set of data with a single value that represents the middle or center of its distribution.

It is often referred to as a “summary statistic.” The three most common measures are: 1. Mean 2. Median 3. Mode


2. The Arithmetic Mean

The mean (or average) is the most popular and well-known measure of central tendency. It is calculated by summing all the values in a data set and dividing by the number of values.

2.1 Mathematical Formula

For a sample of size \(n\), the sample mean (\(\bar{x}\)) is calculated as: \[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}\]

For a population of size \(N\), the population mean (\(\mu\)) is: \[\mu = \frac{\sum_{i=1}^{N} X_i}{N}\]

Where: * \(\sum\) = Summation symbol * \(x_i\) = Each individual value * \(n\) or \(N\) = Total number of observations

2.2 Example

Data: A student’s test scores are 85, 90, 78, 92, and 88.

Calculation: \[\bar{x} = \frac{85 + 90 + 78 + 92 + 88}{5} = \frac{433}{5} = 86.6\]

2.3 Real-Life Application

  • Economics: Calculating the average income per capita in a country.
  • Meteorology: Determining the average daily temperature over a month.

2.4 Pros and Cons

  • Pros: Uses every value in the dataset; mathematically stable.
  • Cons: Highly sensitive to outliers (extreme values).

3. The Median

The median is the middle value in a data set that has been arranged in ascending or descending order. It splits the data into two equal halves.

3.1 Mathematical Calculation

  1. Step 1: Sort the data: \(x_{(1)} \leq x_{(2)} \leq \dots \leq x_{(n)}\)
  2. Step 2:
    • If \(n\) is odd, the median is the value at position: \(\frac{n+1}{2}\)
    • If \(n\) is even, the median is the average of the values at positions \(\frac{n}{2}\) and \(\frac{n}{2} + 1\).

3.2 Example

Case A (Odd): Scores: 10, 15, 20. * Sorted: 10, 15, 20. Median = 15.

Case B (Even): Scores: 10, 15, 20, 100. * Sorted: 10, 15, 20, 100. * Calculation: \(\frac{15 + 20}{2} = 17.5\).

3.3 Real-Life Application

  • Real Estate: Median house prices are used instead of the mean because a few multi-million dollar mansions would skew the average, making houses seem more expensive than they typically are.

4. The Mode

The mode is the value that appears most frequently in a data set.

4.1 Characteristics

  • A dataset can be unimodal (one mode), bimodal (two modes), or multimodal (three or more).
  • If no number repeats, the dataset has no mode.

4.2 Example

Data: 2, 4, 4, 6, 7, 8, 4, 10 * Mode: 4 (it appears three times).

4.3 Real-Life Application

  • Inventory Management: A shoe store manager needs to know the “modal” shoe size (the most common size) to ensure they have enough stock for the majority of customers.

5. Comparing Mean, Median, and Mode

Feature Mean Median Mode
Data Type Quantitative Quantitative / Ordinal All types
Sensitivity to Outliers High Low Low
Use Case Symmetric data Skewed data Categorical data

5.1 Distribution Shape

  • Symmetric (Normal): Mean \(\approx\) Median \(\approx\) Mode.
  • Right Skewed (Positive): Mode < Median < Mean.
  • Left Skewed (Negative): Mean < Median < Mode.

6. R Implementation

Below is how we calculate these measures using R.

# Create a sample dataset of employee salaries (in thousands)
salaries <- c(45, 50, 52, 55, 58, 60, 62, 150) # 150 is an outlier

# Calculate Mean
mean_val <- mean(salaries)

# Calculate Median
med_val <- median(salaries)

# Calculate Mode (R does not have a built-in mode function for scalars)
get_mode <- function(v) {
  uniqv <- unique(v)
  uniqv[which.max(tabulate(match(v, uniqv)))]
}
mode_val <- get_mode(salaries)

# Display Results
cat("Mean Salary:", mean_val, "\n")
## Mean Salary: 66.5
cat("Median Salary:", med_val, "\n")
## Median Salary: 56.5
cat("Mode Salary:", mode_val, "\n")
## Mode Salary: 45

Observation: Notice how the mean ($66.5k) is higher than the median ($56.5k) because of the $150k outlier. In this case, the median is a better representation of the “typical” salary.


7. Summary Exercise

Imagine you are analyzing the number of goals scored by a football team in 10 matches: 0, 1, 1, 2, 2, 2, 3, 3, 4, 12

  1. Calculate the Mean.
  2. Calculate the Median.
  3. Identify the Mode.
  4. Which measure is most misleading due to the 12-goal outlier? ```

Key features of this RMarkdown file:

  1. YAML Header: Sets the title, table of contents (TOC), and theme.
  2. LaTeX Math: Uses $...$ for inline and $$...$$ for block equations.
  3. Markdown Formatting: Uses headers, bold text, and tables for readability.
  4. R Code Chunk: Includes a practical example of how to compute these values in R, including a custom function for the mode.
  5. Real-Life Context: Explains why we use specific measures (e.g., house prices for the median).