1. Introduction

In statistics, a measure of central tendency is a summary measure that attempts to describe a whole set of data with a single value that represents the middle or center of its distribution.

It is often referred to as the “typical” value of the dataset. The three most common measures are: 1. Mean 2. Median 3. Mode


2. The Arithmetic Mean

The mean (or average) is the most popular and well-known measure of central tendency. It is calculated by summing all the values in a dataset and dividing by the total number of values.

Mathematical Formula

For a sample of size \(n\), the sample mean \(\bar{x}\) is given by:

\[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}\]

Where: * \(\bar{x}\) = Sample mean * \(x_i\) = The \(i^{th}\) value in the dataset * \(n\) = Number of observations

Real-Life Example

Scenario: A teacher wants to find the average score of 5 students in a math quiz. Scores: 70, 85, 80, 90, 75.

Calculation: \[\bar{x} = \frac{70 + 85 + 80 + 90 + 75}{5} = \frac{400}{5} = 80\]

R Implementation

scores <- c(70, 85, 80, 90, 75)
mean_score <- mean(scores)
print(paste("The Mean Score is:", mean_score))
## [1] "The Mean Score is: 80"

3. The Median

The median is the middle value in a distribution when the values are arranged in ascending or descending order. It splits the data into two equal halves.

Mathematical Determination

  1. Arrange data from smallest to largest.
  2. If \(n\) is odd: The median is the middle value at position \(\frac{n+1}{2}\).
  3. If \(n\) is even: The median is the average of the two middle values at positions \(\frac{n}{2}\) and \(\frac{n}{2} + 1\).

Real-Life Example

Scenario: Monthly salaries of 5 employees: $2,000, $2,500, $3,000, $3,500, $10,000. (Notice that $10,000 is an outlier).

Calculation: The values are already sorted. Since \(n=5\) (odd), the middle value is the 3rd one. Median = $3,000.

Note: The mean would be $4,200, which doesn’t represent the “typical” employee well because of the outlier. This is why the Median is preferred for skewed data.

R Implementation

salaries <- c(2000, 2500, 3000, 3500, 10000)
median_salary <- median(salaries)
print(paste("The Median Salary is:", median_salary))
## [1] "The Median Salary is: 3000"

4. The Mode

The mode is the value that appears most frequently in a dataset. A dataset can have one mode (unimodal), two modes (bimodal), or several modes (multimodal).

Mathematical Formula

There is no specific algebraic formula for the mode; it is identified by frequency counting: \[Mode = \text{Value with highest frequency}\]

Real-Life Example

Scenario: A shoe store tracks the sizes sold in one hour: 7, 8, 8, 9, 10, 8, 11.

Calculation: * 7: 1 time * 8: 3 times * 9: 1 time * 10: 1 time * 11: 1 time

Mode = 8.

R Implementation

Note: R does not have a standard built-in function for the statistical mode, so we create a simple function.

get_mode <- function(v) {
   uniqv <- unique(v)
   uniqv[which.max(tabulate(match(v, uniqv)))]
}

shoe_sizes <- c(7, 8, 8, 9, 10, 8, 11)
mode_size <- get_mode(shoe_sizes)
print(paste("The Mode Shoe Size is:", mode_size))
## [1] "The Mode Shoe Size is: 8"

5. Comparison: When to use which?

Measure Best Used For Sensitivity to Outliers
Mean Continuous data, symmetric distribution High (Outliers pull the mean)
Median Skewed data (e.g., income, house prices) Low (Robust to outliers)
Mode Categorical data (e.g., most popular car color) Low

Visualizing the Relationship

In a normal distribution, Mean = Median = Mode. In a right-skewed distribution (positive skew): Mean > Median > Mode. In a left-skewed distribution (negative skew): Mean < Median < Mode.


6. Summary Exercises

  1. Calculate the mean of the following dataset: 12, 15, 18, 22, 30.
  2. Identify the median of: 3, 10, 2, 8, 15, 12. (Remember to sort first!)
  3. Find the mode of the following colors: Red, Blue, Red, Green, Red, Blue.

End of Module I Notes ```

Key Components included in this draft:

  1. YAML Header: Standard configuration for R Markdown.
  2. LaTeX Equations: Used $$ for professional mathematical rendering of the mean and median logic.
  3. R Code Chunks: Included mean(), median(), and a custom get_mode() function to show students how to perform these calculations computationally.
  4. Formatting: Used tables and bold text to highlight comparisons and definitions.
  5. Context: Added “Real-Life Examples” for each measure to improve conceptual understanding (e.g., why the median is better for salaries).