title: ‘Module I: Measures of Central Tendency’ author: ‘Saed Mohamed Ahmed’ date: “2025-12-28” output: html_document: toc: true toc_depth: 2 theme: united pdf_document: toc: true —
In statistics, a Measure of Central Tendency is a summary measure that attempts to describe a whole set of data with a single value that represents the middle or center of its distribution.
It is often referred to as a “summary statistic.” The three most common measures are the Mean, Median, and Mode.
The mean (or average) is the sum of all observations divided by the total number of observations.
For a Sample Mean (\(\bar{x}\)): \[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}\]
For a Population Mean (\(\mu\)): \[\mu = \frac{\sum_{i=1}^{N} X_i}{N}\]
Where: * \(\sum\) = Summation sign * \(x_i\) = The value of each individual observation * \(n\) = Number of observations in the sample
# Dataset: Exam scores of 10 students
scores <- c(85, 90, 78, 92, 71, 88, 76, 84, 89, 95)
# Calculating Mean
mean_score <- mean(scores)
print(paste("The mean score is:", mean_score))
## [1] "The mean score is: 84.8"
Economic Indicators: Calculating the average household income in a city to determine the general standard of living.
Pros/Cons: * Pro: Uses every value in the dataset. * Con: Highly sensitive to outliers (extreme values).
The median is the middle value of a dataset when it has been arranged in ascending or descending order. It splits the data into two equal halves.
# Dataset with an outlier
salaries <- c(30000, 32000, 34000, 31000, 250000) # One very high salary
# Calculating Median vs Mean
med_sal <- median(salaries)
avg_sal <- mean(salaries)
print(paste("Median Salary:", med_sal))
## [1] "Median Salary: 32000"
print(paste("Mean Salary:", avg_sal))
## [1] "Mean Salary: 75400"
Real Estate: House prices are usually reported as “Median Price” because a single multi-million dollar mansion (outlier) would artificially inflate the mean, giving a false impression of what a “typical” house costs.
The mode is the value that appears most frequently in a dataset.
R does not have a built-in function for the statistical mode, so we use a custom function:
get_mode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
# Dataset: Shirt sizes sold in a day
sizes <- c("S", "M", "L", "M", "M", "S", "XL", "M", "L")
mode_size <- get_mode(sizes)
print(paste("The modal shirt size is:", mode_size))
## [1] "The modal shirt size is: M"
Inventory Management: A shoe store manager needs the Mode to know which shoe size is sold most often to ensure it is always in stock.
The relationship between Mean, Median, and Mode depends on the skewness of the distribution.
| Measure | Definition | Best Used For | Sensitive to Outliers? |
|---|---|---|---|
| Mean | Numerical average | Symmetric/Continuous data | Yes |
| Median | Middle value | Skewed data (Income, Housing) | No |
| Mode | Most frequent | Categorical data (Colors, Sizes) | No |
age with the following
values: 22, 25, 22, 30, 24, 100.File -> New File -> R Markdown.