This is a comprehensive lecture note drafted in RMarkdown format. You can copy this content, save it as a .Rmd file, and knit it to HTML, PDF, or Word in RStudio.


---
title: "Module I: Measures of Central Tendency"
author: "Statistics Department"
date: "2025-12-28"
output:
  html_document:
    toc: true
    toc_depth: 2
    theme: united
---



# 1. Introduction
A **Measure of Central Tendency** is a single value that attempts to describe a set of data by identifying the central position within that set of data. They are sometimes called measures of central location or "summary statistics."

In this module, we focus on the three most common measures:
1. **The Mean** (Arithmetic Average)
2. **The Median** (Middle Value)
3. **The Mode** (Most Frequent Value)

---

# 2. The Arithmetic Mean ($\bar{x}$)

The mean is the sum of all observations divided by the total number of observations. It is the most common measure of central tendency.

### Mathematical Formula
For a sample of size $n$:
$$\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} = \frac{x_1 + x_2 + ... + x_n}{n}$$

Where:
- $\bar{x}$ = Sample Mean
- $\sum$ = Summation symbol
- $x_i$ = Individual values
- $n$ = Total number of values

### Real-Life Example: Daily Sales
A small coffee shop records its daily sales (in USD) for 5 days: $120, 150, 130, 400, 140$.

**Calculation:**
$$\bar{x} = \frac{120 + 150 + 130 + 400 + 140}{5} = \frac{940}{5} = 188$$
**Interpretation:** The average daily revenue is \$188. Note how the "outlier" (\$400) pulled the mean upward.

### R Implementation

```r
sales <- c(120, 150, 130, 400, 140)
mean_sales <- mean(sales)
print(paste("The mean sales is:", mean_sales))
## [1] "The mean sales is: 188"

3. The Median (\(\tilde{x}\))

The median is the middle value in a data set that has been arranged in ascending or descending order. It is robust to outliers.

Mathematical Formula

  1. Arrange data from smallest to largest.
  2. If \(n\) is odd: \[\text{Median} = \text{Value at position } \left(\frac{n+1}{2}\right)\]
  3. If \(n\) is even: \[\text{Median} = \frac{\text{Value at position } (\frac{n}{2}) + \text{Value at } (\frac{n}{2} + 1)}{2}\]

Real-Life Example: Housing Prices

Imagine 5 houses on a street priced at: $200k, $250k, $210k, $230k, and $1,200k.

  • Sorted: 200, 210, 230, 250, 1,200.
  • Median: $230,000.

Note: The mean would be $418,000, which doesn’t represent the “typical” house on this street. The median is a better representation here.

R Implementation

house_prices <- c(200, 250, 210, 230, 1200)
median_price <- median(house_prices)
print(paste("The median house price is:", median_price))
## [1] "The median house price is: 230"

4. The Mode

The mode is the value that appears most frequently in a data set. A set can be unimodal (one mode), bimodal (two modes), or multimodal.

Real-Life Example: Inventory Management

A shoe store tracks the sizes of sneakers sold in an hour: \(8, 9, 9, 10, 11, 9, 8\).

  • Frequency:
    • Size 8: 2 times
    • Size 9: 3 times
    • Size 10: 1 time
    • Size 11: 1 time
  • Mode: Size 9.

Interpretation: The store manager should stock more of size 9 because it is the most popular (frequent) demand.

R Implementation

R does not have a standard built-in function for the statistical mode, so we create a simple one:

get_mode <- function(v) {
  uniqv <- unique(v)
  uniqv[which.max(tabulate(match(v, uniqv)))]
}

shoe_sizes <- c(8, 9, 9, 10, 11, 9, 8)
print(paste("The mode of shoe sizes is:", get_mode(shoe_sizes)))
## [1] "The mode of shoe sizes is: 9"

5. Summary: Which measure to use?

Measure Best for… Sensitive to Outliers?
Mean Symmetric data, Normal distributions Yes (Very)
Median Skewed data, Income, Prices No
Mode Categorical data (Colors, Names) No

Visualizing Distribution and Central Tendency

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.


6. Exercises

  1. Calculate the mean, median, and mode for the following dataset representing the number of hours students slept: 7, 8, 5, 9, 7, 10, 2.
  2. Why is the median often used instead of the mean when reporting “Median Household Income”?
  3. If a dataset is perfectly symmetrical, what is the relationship between the Mean, Median, and Mode?

### Key Components included in this draft:
1.  **Mathematical Accuracy:** Included LaTeX formatting for the formulas of Mean and Median.
2.  **R Code Integration:** Provided code chunks to calculate these values, including a custom function for the Mode (since R's `mode()` function returns data types, not the statistical mode).
3.  **Real-World Context:** Used examples like coffee shop sales, housing prices (to explain outlier sensitivity), and shoe sizes (for categorical/mode use).
4.  **Visual Aid:** Included a `ggplot2` snippet to visualize how skewness separates the mean and median.


## R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.

When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:


```r
summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.