1. Introduction

In statistics, a Measurement of Central Tendency is a summary measure that attempts to describe a whole set of data with a single value that represents the middle or center of its distribution.

It is often referred to as the “average” of the data. The three most common measures are: 1. Mean 2. Median 3. Mode


2. The Arithmetic Mean

The Mean is the sum of all values divided by the total number of values. It is the most common measure of central tendency.

Mathematical Formula

For a sample of size \(n\), the sample mean \(\bar{x}\) is calculated as: \[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}\]

Where: - \(\sum\) = Summation symbol - \(x_i\) = Each individual value in the dataset - \(n\) = Total number of observations

Real-Life Example

Scenario: A teacher wants to find the average score of 5 students in a math quiz. Scores: 85, 90, 78, 92, 88.

Manual Calculation: \[\bar{x} = \frac{85 + 90 + 78 + 92 + 88}{5} = \frac{433}{5} = 86.6\]

Implementation in R

scores <- c(85, 90, 78, 92, 88)
mean_score <- mean(scores)
print(paste("The mean score is:", mean_score))
## [1] "The mean score is: 86.6"

3. The Median

The Median is the middle value in a dataset when the values are arranged in ascending or descending order. It splits the data into two equal halves.

Mathematical Formula

  1. Sort the data from smallest to largest.
  2. If \(n\) is odd, the median is the value at position: \[\text{Median} = \left(\frac{n+1}{2}\right)^{th} \text{term}\]
  3. If \(n\) is even, the median is the average of the two middle terms: \[\text{Median} = \frac{(\frac{n}{2})^{th} \text{term} + (\frac{n}{2} + 1)^{th} \text{term}}{2}\]

Real-Life Example

Scenario: Analyzing household incomes in a neighborhood to avoid the influence of one billionaire (outlier). Incomes (in thousands): 45, 50, 52, 55, 600.

Manual Calculation: The data is already sorted. \(n=5\) (odd). Median = \((\frac{5+1}{2})^{th} = 3^{rd}\) term. Median = 52. (Note: The Mean would be 160.4, which is not representative of the typical neighbor!)

Implementation in R

incomes <- c(45, 50, 52, 55, 600)
median_income <- median(incomes)
print(paste("The median income is:", median_income))
## [1] "The median income is: 52"

4. The Mode

The Mode is the value that appears most frequently in a dataset. A dataset can have one mode (unimodal), two modes (bimodal), or many (multimodal).

Mathematical Formula

There is no specific algebraic formula for the mode; it is identified by frequency counting: \[\text{Mode} = \text{Value with } \max(\text{frequency})\]

Real-Life Example

Scenario: A shoe store manager wants to know which shoe size to stock most. Sizes sold: 7, 8, 8, 9, 10, 10, 10, 11, 12.

Manual Calculation: - 7 occurs 1 time - 8 occurs 2 times - 9 occurs 1 time - 10 occurs 3 times - 11 occurs 1 time Mode = 10.

Implementation in R

Note: R does not have a built-in function for the statistical mode (the mode() function returns the data type). We use a custom function.

get_mode <- function(v) {
   uniqv <- unique(v)
   uniqv[which.max(tabulate(match(v, uniqv)))]
}

shoe_sizes <- c(7, 8, 8, 9, 10, 10, 10, 11, 12)
result <- get_mode(shoe_sizes)
print(paste("The mode of shoe sizes is:", result))
## [1] "The mode of shoe sizes is: 10"

5. Summary: Which Measure to Use?

Measure When to use? Sensitivity to Outliers
Mean Symmetric data, no extreme outliers. High (Very sensitive)
Median Skewed data or data with outliers (e.g., Salary). Low (Robust)
Mode Categorical/Nominal data (e.g., Color, Gender). Low

Relationship with Skewness

  • Symmetric Distribution: Mean \(\approx\) Median \(\approx\) Mode.
  • Right Skewed (Positive): Mean \(>\) Median \(>\) Mode.
  • Left Skewed (Negative): Mean \(<\) Median \(<\) Mode.

6. Practice Exercises

  1. Create a vector in R containing the following numbers: 12, 15, 12, 18, 20, 100.
  2. Calculate the mean and median.
  3. Observe how the value 100 (outlier) pulls the mean away from the center compared to the median.
practice_data <- c(12, 15, 12, 18, 20, 100)
print(paste("Mean:", mean(practice_data)))
## [1] "Mean: 29.5"
print(paste("Median:", median(practice_data)))
## [1] "Median: 16.5"

```

Key Features of this Document:

  1. LaTeX Integration: Uses $...$ and $$...$$ for professional mathematical notation.
  2. R Code Chunks: Provides executable code to calculate mean, median, and a custom function for the mode.
  3. Markdown Styling: Uses headers, tables, and lists for readability.
  4. Real-Life Context: Explains why we use median over mean (using the income example) to provide practical data literacy.