2025-03-17

Introduction to Central Tendency

Measuring central tendency help us to find the center point of a data distribution.

The three major measures of central tendency are as follows:

  • Mean: This is the arithmetic average of all the values.
  • Median: This is the exact middle value when data is arranged in ascending/descending order.
  • Mode: This is the most frequently occurring value in the dataset.

These measures help us understand what is the trend in our dataset.

The Maths of Central Tendency

For a dataset \(X = \{x_1, x_2, ..., x_n\}\):

Mean (or Average)

\[\bar{x} = \frac{1}{n}\sum_{i=1}^{n}x_i = \frac{x_1 + x_2 + ... + x_n}{n}\]

Median

\[\text{Median} = \begin{cases} x_{(\frac{n+1}{2})}, & \text{if } n \text{ is odd} \\ \frac{x_{(\frac{n}{2})} + x_{(\frac{n}{2}+1)}}{2}, & \text{if } n \text{ is even} \end{cases}\]

Mode

\[\text{Mode} = \text{value that appears most in } X\]

When do we use what measure?

Visualizing Central Tendency: Normal Distribution

Mean and Median in the Skewed Data

Common Data Distribution Shapes

Creating a Simple Function To Calculate Mean

# Function to calculate the mean
meanCalculation = function(x) {
  sum(x) / length(x)
}

# Running test cases with sample
dataSet = c(12, 15, 18, 22, 30, 31, 35, 40)
cat(" Mean using our function:", meanCalculation(dataSet) ,"\n" 
    ,"Mean using function in R:", mean(dataSet))
##  Mean using our function: 25.375 
##  Mean using function in R: 25.375

Course Grade Analysis

# Creating sample grades
set.seed(456)
nameOfCourses = c("Computers", "Data Science", "English", "Maths", "Science")
grades = data.frame(
  Courses = rep(nameOfCourses, each = 20),
  Grades = round(runif(100, min = 60, max = 100))
)
# Calculate Measures for the course
statsForCourse = grades %>%
  group_by(Courses) %>%
  summarize(
    Mean = mean(Grades),
    Median = median(Grades)
  )
# Showing the results
statsForCourse
## # A tibble: 5 × 3
##   Courses       Mean Median
##   <chr>        <dbl>  <dbl>
## 1 Computers     79.4   76.5
## 2 Data Science  86.8   88.5
## 3 English       78.5   77  
## 4 Maths         77.0   76.5
## 5 Science       86     86

Simple Course Comparison Chart

Conclusion

Key findings about measures of central tendency:

  1. Mean
    • Mostly used for normal distribution
    • Sensitive to outliers
    • Very commonly used in statistics
  2. Median
    • Mostly used for skewed distributions
    • Easily used against outliers
    • Used for the statistics like income, housing prices etc.
  3. Mode
    • Mostly used for data in categories
    • Used to find the most common values
    • Can be used tp find multiple peaks in distribution

The choice of what measure to use depends on data type and distribution.