To remove the knitr setup block and specific engine configurations, the document becomes a cleaner R Markdown file. This version focuses on the content and standard R code chunks that will still run in RStudio or any standard Markdown viewer.


---
title: "Module 1: Measures of Central Tendency"
author: "Statistics Department"
date: "2025-12-28"
output: html_document
---

# 1. Introduction
A **measure of central tendency** is a single value that identifies the center of a data set. It provides a summary of where the "middle" of the distribution lies.

The three primary measures are:
1. **Mean** (The average)
2. **Median** (The middle value)
3. **Mode** (The most frequent value)

---

# 2. The Arithmetic Mean
The mean is calculated by summing all observations and dividing by the total number of observations.

### Mathematical Formula
For a sample size $n$:
$$\bar{x} = \frac{\sum x_i}{n}$$

### Real-Life Example: Daily Temperatures
A meteorologist records the high temperatures (in Celsius) for a week: **22, 24, 21, 23, 25, 28, 22**.

**Calculation:**
$$\bar{x} = \frac{22+24+21+23+25+28+22}{7} = \frac{165}{7} \approx 23.57°C$$

### R Code Example

``` r
temps <- c(22, 24, 21, 23, 25, 28, 22)
mean(temps)
## [1] 23.57143

3. The Median

The median is the value that occupies the middle position when data is sorted in order.

Calculation Rule

  1. Arrange data from smallest to largest.
  2. If \(n\) is odd: The median is the middle number.
  3. If \(n\) is even: The median is the average of the two middle numbers.

Real-Life Example: Household Income

Imagine 5 households with incomes (in thousands): $35, $40, $42, $45, $150. Because the $150k income is much higher than the others (an outlier), the mean would be $62.4k. However, the Median is $42k, which is a more accurate representation of the “typical” household in this group.

R Code Example

incomes <- c(35, 40, 42, 45, 150)
median(incomes)
## [1] 42

4. The Mode

The mode is the value that appears most frequently in the dataset.

Characteristics

  • It can be used for both numeric and categorical (non-numeric) data.
  • A dataset can be bimodal (two modes) or have no mode at all.

Real-Life Example: Transportation

A survey asks 10 students how they get to campus: Bus, Car, Bus, Bike, Bus, Walk, Car, Bus, Walk, Bike.

The Mode is “Bus” because it appears 4 times, more than any other method.

R Code Example

# Standard R doesn't have a mode function, so we use a frequency table
transport <- c("Bus", "Car", "Bus", "Bike", "Bus", "Walk", "Car", "Bus", "Walk", "Bike")
table(transport)
## transport
## Bike  Bus  Car Walk 
##    2    4    2    2

5. Summary Table

Measure Best for… Affected by Outliers?
Mean Symmetric data (Normal distribution) Yes (Sensitive)
Median Skewed data (Income, House prices) No (Robust)
Mode Categorical data (Colors, Brands) No (Robust)

6. Practice Exercise

Below is a dataset representing the number of hours students slept before an exam: 5, 6, 6, 7, 7, 7, 8, 9, 10, 4

Calculate the Mean and Median using the code block below:

sleep_hours <- c(5, 6, 6, 7, 7, 7, 8, 9, 10, 4)

cat("Mean Sleep:", mean(sleep_hours), "\n")
## Mean Sleep: 6.9
cat("Median Sleep:", median(sleep_hours))
## Median Sleep: 7

```