This is a comprehensive lecture note drafted in
RMarkdown format. You can copy this content, save it as
a .Rmd file, and knit it to HTML, PDF, or Word in
RStudio.
---
title: "Module I: Measures of Central Tendency"
author: "Statistics Department"
date: "2025-12-28"
output:
html_document:
toc: true
toc_depth: 2
theme: united
---
# 1. Introduction
A **Measure of Central Tendency** is a single value that attempts to describe a set of data by identifying the central position within that set of data. They are sometimes called measures of central location or "summary statistics."
In this module, we focus on the three most common measures:
1. **The Mean** (Arithmetic Average)
2. **The Median** (Middle Value)
3. **The Mode** (Most Frequent Value)
---
# 2. The Arithmetic Mean ($\bar{x}$)
The mean is the sum of all observations divided by the total number of observations. It is the most common measure of central tendency.
### Mathematical Formula
For a sample of size $n$:
$$\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} = \frac{x_1 + x_2 + ... + x_n}{n}$$
Where:
- $\bar{x}$ = Sample Mean
- $\sum$ = Summation symbol
- $x_i$ = Individual values
- $n$ = Total number of values
### Real-Life Example: Daily Sales
A small coffee shop records its daily sales (in USD) for 5 days: $120, 150, 130, 400, 140$.
**Calculation:**
$$\bar{x} = \frac{120 + 150 + 130 + 400 + 140}{5} = \frac{940}{5} = 188$$
**Interpretation:** The average daily revenue is \$188. Note how the "outlier" (\$400) pulled the mean upward.
### R Implementation
```r
sales <- c(120, 150, 130, 400, 140)
mean_sales <- mean(sales)
print(paste("The mean sales is:", mean_sales))
## [1] "The mean sales is: 188"
The median is the middle value in a data set that has been arranged in ascending or descending order. It is robust to outliers.
Imagine 5 houses on a street priced at: $200k, $250k, $210k, $230k, and $1,200k.
Note: The mean would be $418,000, which doesn’t represent the “typical” house on this street. The median is a better representation here.
house_prices <- c(200, 250, 210, 230, 1200)
median_price <- median(house_prices)
print(paste("The median house price is:", median_price))
## [1] "The median house price is: 230"
The mode is the value that appears most frequently in a data set. A set can be unimodal (one mode), bimodal (two modes), or multimodal.
A shoe store tracks the sizes of sneakers sold in an hour: \(8, 9, 9, 10, 11, 9, 8\).
Interpretation: The store manager should stock more of size 9 because it is the most popular (frequent) demand.
R does not have a standard built-in function for the statistical mode, so we create a simple one:
get_mode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
shoe_sizes <- c(8, 9, 9, 10, 11, 9, 8)
print(paste("The mode of shoe sizes is:", get_mode(shoe_sizes)))
## [1] "The mode of shoe sizes is: 9"
| Measure | Best for… | Sensitive to Outliers? |
|---|---|---|
| Mean | Symmetric data, Normal distributions | Yes (Very) |
| Median | Skewed data, Income, Prices | No |
| Mode | Categorical data (Colors, Names) | No |
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
7, 8, 5, 9, 7, 10, 2.
### Key Components included in this draft:
1. **Mathematical Accuracy:** Included LaTeX formatting for the formulas of Mean and Median.
2. **R Code Integration:** Provided code chunks to calculate these values, including a custom function for the Mode (since R's `mode()` function returns data types, not the statistical mode).
3. **Real-World Context:** Used examples like coffee shop sales, housing prices (to explain outlier sensitivity), and shoe sizes (for categorical/mode use).
4. **Visual Aid:** Included a `ggplot2` snippet to visualize how skewness separates the mean and median.
## R Markdown
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.
When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
```r
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
You can also embed plots, for example:
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.