1. Introduction

A measure of central tendency is a summary statistic that represents the center point or typical value of a dataset. These measures indicate where most values in a distribution fall and are also referred to as the “location” of the data.

The three most common measures are: 1. The Mean 2. The Median 3. The Mode

2. The Arithmetic Mean

The mean (or average) is the sum of all values divided by the total number of values.

Mathematical Formula

For a sample of size $n$, the sample mean $\bar{x}$ is calculated as:

\[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} = \frac{x_1 + x_2 + ... + x_n}{n}\]

Where: - $\sum$: Symbol for summation. - $x_i$: Each individual value in the dataset. - $n$: The number of observations.

Real-Life Example

Imagine a small tech startup with five employees earning the following annual salaries (in thousands of dollars): 50, 60, 65, 70, and 75.

R Implementation

salaries <- c(50, 60, 65, 70, 75)
mean_salary <- mean(salaries)
print(paste("The mean salary is:", mean_salary))

## [1] "The mean salary is: 64"

3. The Median

The median is the middle value in a dataset when the values are arranged in ascending or descending order.

Mathematical Formula

Arrange data from smallest to largest.
If $n$ is odd, the median is the middle value: \[\text{Median} = x_{(\frac{n+1}{2})}\]
If $n$ is even, the median is the average of the two middle values: \[\text{Median} = \frac{x_{(n/2)} + x_{(n/2 + 1)}}{2}\]

Real-Life Example

Consider the same salaries: 50, 60, 65, 70, 75. The middle value is 65. If we add a CEO earning 300, the set becomes: 50, 60, 65, 70, 75, 300.

R Implementation

salaries_with_ceo <- c(50, 60, 65, 70, 75, 300)

# Mean vs Median
mean_val <- mean(salaries_with_ceo)
median_val <- median(salaries_with_ceo)

cat("Mean with CEO:", mean_val, "\n")

## Mean with CEO: 103.3333

cat("Median with CEO:", median_val)

## Median with CEO: 67.5

Note: The median is “robust” to outliers, whereas the mean is heavily pulled by the CEO’s high salary.

4. The Mode

The mode is the value that appears most frequently in a dataset. A distribution can be unimodal (one mode), bimodal (two modes), or multimodal.

Real-Life Example

A shoe store tracks the sizes sold in one hour: 7, 8, 8, 9, 10, 10, 10, 11. The mode is 10 because it appears three times.

R Implementation

Standard R does not have a built-in function for Mode, so we create a custom function:

get_mode <- function(v) {
   uniqv <- unique(v)
   uniqv[which.max(tabulate(match(v, uniqv)))]
}

shoe_sizes <- c(7, 8, 8, 9, 10, 10, 10, 11)
print(paste("The mode of shoe sizes is:", get_mode(shoe_sizes)))

## [1] "The mode of shoe sizes is: 10"

5. Comparison: When to use what?

Measure	Best for…	Sensitivity to Outliers
Mean	Symmetric data, Normal distributions	High (Very Sensitive)
Median	Skewed data (e.g., Income, Home prices)	Low (Robust)
Mode	Categorical data (e.g., Favorite color)	Low

Visualization of Skewness

6. Summary Exercise

The following vector represents the daily number of customers at a local cafe over 10 days: 34, 45, 40, 38, 50, 45, 55, 120, 42, 41.

Calculate the Mean.
Calculate the Median.
Identify the Outlier.
Which measure provides a better “typical” day for the cafe owner?

```

Key Components included:

LaTeX Equations: Used $$ for professional mathematical rendering.
R Code Chunks: Included {r} blocks to demonstrate how to calculate these measures using the R language.
Data Robustness Explanation: Highlighted the difference between Mean and Median using a “CEO salary” outlier example.
Custom Function: Since R’s mode() function refers to data storage types, I provided a functional statistical mode snippet.
Visual Aids: Added a histogram to show how the mean and median split in skewed distributions.

Module I: Measures of Central Tendency

Statistical Methods for Data Analysis

Your Name/Department

2025-12-28

1. Introduction

2. The Arithmetic Mean

Mathematical Formula

Real-Life Example

R Implementation

3. The Median

Mathematical Formula

Real-Life Example

R Implementation

4. The Mode

Real-Life Example

R Implementation

5. Comparison: When to use what?

Visualization of Skewness

6. Summary Exercise

Key Components included: