1. Introduction

A Measure of Central Tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data. In statistics, the three most common measures of central tendency are the Mean, Median, and Mode.

2. The Arithmetic Mean

The mean (or average) is the most popular and well-known measure of central tendency. It is the sum of all values divided by the number of values.

Mathematical Formula

For a sample of size \(n\), the sample mean \(\bar{x}\) is calculated as:

\[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}\]

Where: - \(\sum\): The symbol for summation. - \(x_i\): Each individual value in the data set. - \(n\): The total number of observations.

R Example

# Sample data: Exam scores of 5 students
scores <- c(85, 90, 88, 76, 92)

# Calculate Mean
mean_score <- mean(scores)
print(paste("The mean score is:", mean_score))
## [1] "The mean score is: 86.2"

Real-Life Example

Income Analysis: A company uses the mean to calculate the average salary of its employees to set budget expectations for the following year. However, the mean can be heavily influenced by “outliers” (e.g., a CEO earning 100x more than staff).


3. The Median

The median is the middle score for a set of data that has been arranged in order of magnitude. It is less affected by outliers and skewed data.

Mathematical Formula

  1. Arrange data from smallest to largest.
  2. If \(n\) is odd, the median is the middle value at position: \[\text{Median} = \left( \frac{n+1}{2} \right)^{th} \text{term}\]
  3. If \(n\) is even, the median is the average of the two middle values: \[\text{Median} = \frac{(\frac{n}{2})^{th} \text{term} + (\frac{n}{2} + 1)^{th} \text{term}}{2}\]

R Example

# Data with an outlier
salaries <- c(30000, 32000, 31000, 35000, 500000) # One very high salary

# Mean vs Median
print(paste("Mean Salary:", mean(salaries)))
## [1] "Mean Salary: 125600"
print(paste("Median Salary:", median(salaries)))
## [1] "Median Salary: 32000"

Real-Life Example

Real Estate: When reporting house prices in a city, the median is preferred. If one mansion sells for $10 million and five small houses sell for $200,000, the mean would suggest the average house is expensive, while the median reflects what a typical buyer can actually afford.


4. The Mode

The mode is the value that appears most frequently in a data set. A data set can have one mode, more than one mode (bimodal/multimodal), or no mode at all.

R Example

R does not have a built-in function for mode, so we often use a custom function or a table.

# Sample data: Shirt sizes sold
sizes <- c("S", "M", "L", "M", "S", "M", "XL", "M")

# Calculate Mode using a frequency table
mode_table <- table(sizes)
print(mode_table)
## sizes
##  L  M  S XL 
##  1  4  2  1
# Finding the name of the max frequency
mode_val <- names(mode_table)[which.max(mode_table)]
print(paste("The modal size is:", mode_val))
## [1] "The modal size is: M"

Real-Life Example

Inventory Management: A shoe store manager looks at the mode of shoe sizes sold to decide which size to stock most heavily. Knowing the “average” shoe size (e.g., 8.4) is useless, but knowing that size 9 is the mode is actionable.


5. Choosing the Right Measure

Measure Best Used For… Sensitivity to Outliers
Mean Numerical data with a normal distribution. High (Sensitive)
Median Numerical data with extreme values (outliers). Low (Robust)
Mode Categorical (Nominal) data. Low

Distribution Shapes

  • Symmetric: Mean \(\approx\) Median \(\approx\) Mode.
  • Right Skewed (Positive): Mean > Median > Mode.
  • Left Skewed (Negative): Mean < Median < Mode.

6. Practice Exercise

  1. Create a vector in R named my_data with the following values: 10, 15, 15, 17, 18, 21, 90.
  2. Calculate the mean and median.
  3. Which measure best represents the “center” of this data? Why?
my_data <- c(10, 15, 15, 17, 18, 21, 90)
# Add your code here

```

Key Features Included:

  1. Mathematical Notation: Uses LaTeX for clean formulas (Summation, Mean, Median positions).
  2. R Code Chunks: Provides executable code for calculating mean, median, and finding the mode (using the table method).
  3. Real-Life Context: Explains why we use specific measures (e.g., using Median for house prices to avoid outlier distortion).
  4. Comparison Table: A quick reference guide for students to choose the correct measure based on data type.
  5. Visual Structure: Uses Markdown headers and bullet points for readability.