To remove the knitr setup block and specific engine
configurations, the document becomes a cleaner R Markdown file. This
version focuses on the content and standard R code chunks that will
still run in RStudio or any standard Markdown viewer.
---
title: "Module 1: Measures of Central Tendency"
author: "Statistics Department"
date: "2025-12-28"
output: html_document
---
# 1. Introduction
A **measure of central tendency** is a single value that identifies the center of a data set. It provides a summary of where the "middle" of the distribution lies.
The three primary measures are:
1. **Mean** (The average)
2. **Median** (The middle value)
3. **Mode** (The most frequent value)
---
# 2. The Arithmetic Mean
The mean is calculated by summing all observations and dividing by the total number of observations.
### Mathematical Formula
For a sample size $n$:
$$\bar{x} = \frac{\sum x_i}{n}$$
### Real-Life Example: Daily Temperatures
A meteorologist records the high temperatures (in Celsius) for a week: **22, 24, 21, 23, 25, 28, 22**.
**Calculation:**
$$\bar{x} = \frac{22+24+21+23+25+28+22}{7} = \frac{165}{7} \approx 23.57°C$$
### R Code Example
``` r
temps <- c(22, 24, 21, 23, 25, 28, 22)
mean(temps)
## [1] 23.57143
The median is the value that occupies the middle position when data is sorted in order.
Imagine 5 households with incomes (in thousands): $35, $40, $42, $45, $150. Because the $150k income is much higher than the others (an outlier), the mean would be $62.4k. However, the Median is $42k, which is a more accurate representation of the “typical” household in this group.
incomes <- c(35, 40, 42, 45, 150)
median(incomes)
## [1] 42
The mode is the value that appears most frequently in the dataset.
A survey asks 10 students how they get to campus: Bus, Car, Bus, Bike, Bus, Walk, Car, Bus, Walk, Bike.
The Mode is “Bus” because it appears 4 times, more than any other method.
# Standard R doesn't have a mode function, so we use a frequency table
transport <- c("Bus", "Car", "Bus", "Bike", "Bus", "Walk", "Car", "Bus", "Walk", "Bike")
table(transport)
## transport
## Bike Bus Car Walk
## 2 4 2 2
| Measure | Best for… | Affected by Outliers? |
|---|---|---|
| Mean | Symmetric data (Normal distribution) | Yes (Sensitive) |
| Median | Skewed data (Income, House prices) | No (Robust) |
| Mode | Categorical data (Colors, Brands) | No (Robust) |
Below is a dataset representing the number of hours students slept before an exam: 5, 6, 6, 7, 7, 7, 8, 9, 10, 4
Calculate the Mean and Median using the code block below:
sleep_hours <- c(5, 6, 6, 7, 7, 7, 8, 9, 10, 4)
cat("Mean Sleep:", mean(sleep_hours), "\n")
## Mean Sleep: 6.9
cat("Median Sleep:", median(sleep_hours))
## Median Sleep: 7
```