A Measure of Central Tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data. In statistics, the three most common measures of central tendency are the Mean, Median, and Mode.
The mean (or average) is the most popular and well-known measure of central tendency. It is the sum of all values divided by the number of values.
For a sample of size \(n\), the sample mean \(\bar{x}\) is calculated as:
\[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}\]
Where: - \(\sum\): The symbol for summation. - \(x_i\): Each individual value in the data set. - \(n\): The total number of observations.
# Sample data: Exam scores of 5 students
scores <- c(85, 90, 88, 76, 92)
# Calculate Mean
mean_score <- mean(scores)
print(paste("The mean score is:", mean_score))
## [1] "The mean score is: 86.2"
Income Analysis: A company uses the mean to calculate the average salary of its employees to set budget expectations for the following year. However, the mean can be heavily influenced by “outliers” (e.g., a CEO earning 100x more than staff).
The median is the middle score for a set of data that has been arranged in order of magnitude. It is less affected by outliers and skewed data.
# Data with an outlier
salaries <- c(30000, 32000, 31000, 35000, 500000) # One very high salary
# Mean vs Median
print(paste("Mean Salary:", mean(salaries)))
## [1] "Mean Salary: 125600"
print(paste("Median Salary:", median(salaries)))
## [1] "Median Salary: 32000"
Real Estate: When reporting house prices in a city, the median is preferred. If one mansion sells for $10 million and five small houses sell for $200,000, the mean would suggest the average house is expensive, while the median reflects what a typical buyer can actually afford.
The mode is the value that appears most frequently in a data set. A data set can have one mode, more than one mode (bimodal/multimodal), or no mode at all.
R does not have a built-in function for mode, so we often use a custom function or a table.
# Sample data: Shirt sizes sold
sizes <- c("S", "M", "L", "M", "S", "M", "XL", "M")
# Calculate Mode using a frequency table
mode_table <- table(sizes)
print(mode_table)
## sizes
## L M S XL
## 1 4 2 1
# Finding the name of the max frequency
mode_val <- names(mode_table)[which.max(mode_table)]
print(paste("The modal size is:", mode_val))
## [1] "The modal size is: M"
Inventory Management: A shoe store manager looks at the mode of shoe sizes sold to decide which size to stock most heavily. Knowing the “average” shoe size (e.g., 8.4) is useless, but knowing that size 9 is the mode is actionable.
| Measure | Best Used For… | Sensitivity to Outliers |
|---|---|---|
| Mean | Numerical data with a normal distribution. | High (Sensitive) |
| Median | Numerical data with extreme values (outliers). | Low (Robust) |
| Mode | Categorical (Nominal) data. | Low |
my_data with the following
values: 10, 15, 15, 17, 18, 21, 90.my_data <- c(10, 15, 15, 17, 18, 21, 90)
# Add your code here
```
table method).