A Measure of Central Tendency is a summary statistic that represents the center point or typical value of a dataset. These measures indicate where most values in a distribution fall and are also known as the “location” of the data.
The three most common measures are: 1. Mean 2. Median 3. Mode
The mean (or average) is the sum of all observations divided by the total number of observations.
For a sample of size \(n\), the sample mean \(\bar{x}\) is calculated as: \[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}\]
Where: - \(\sum\): Sigma notation (summation) - \(x_i\): Each individual value in the dataset - \(n\): The total number of values
Scenario: A teacher wants to find the average score of 5 students in a mini-quiz. Scores: 70, 85, 80, 90, 75.
Calculation: \[\bar{x} = \frac{70 + 85 + 80 + 90 + 75}{5} = \frac{400}{5} = 80\]
scores <- c(70, 85, 80, 90, 75)
mean_score <- mean(scores)
print(paste("The Mean Score is:", mean_score))
## [1] "The Mean Score is: 80"
The median is the middle value of a dataset when it has been arranged in ascending or descending order. It splits the data into two equal halves.
Scenario: Weekly salaries of 6 employees in a small startup. Salaries: $500, $600, $550, $3000, $700, $650.
Step 1: Sort: 500, 550, 600, 650, 700, 3000. Step 2: Identify Middle: Since \(n=6\) (even), we average the 3rd and 4th values. \[\text{Median} = \frac{600 + 650}{2} = 625\]
Note: The median is more “robust” than the mean because it isn’t heavily affected by the outlier ($3000).
salaries <- c(500, 600, 550, 3000, 700, 650)
median_salary <- median(salaries)
print(paste("The Median Salary is:", median_salary))
## [1] "The Median Salary is: 625"
The mode is the value that appears most frequently in a dataset.
Scenario: A shoe store records the sizes of sneakers sold in an hour. Sizes: 8, 9, 9, 10, 11, 9, 8, 12, 9.
Calculation: The number “9” appears 4 times, which is more than any other size. Mode = 9.
R does not have a built-in function for mode, but we can find it using a table:
shoe_sizes <- c(8, 9, 9, 10, 11, 9, 8, 12, 9)
counts <- table(shoe_sizes)
mode_val <- names(counts)[counts == max(counts)]
print(paste("The Mode is:", mode_val))
## [1] "The Mode is: 9"
The choice of measure depends on the shape of the distribution and the scale of measurement.
| Measure | Best Used For… | Sensitivity to Outliers |
|---|---|---|
| Mean | Symmetric data (e.g., height, weight) | Highly Sensitive |
| Median | Skewed data (e.g., income, house prices) | Robust (Not affected) |
| Mode | Categorical data (e.g., favorite color) | Not affected |
Imagine a small company with 5 staff members earning $40k, $42k, $45k, $48k, and $150k (the CEO).
```
File > New File > R Markdown....