Descriptive Statistics is one of the primary branches of data analytics. Its fundamental purpose is to summarize and describe the characteristics of a dataset. It focuses on answering the core question: “What happened to the data?”
By using descriptive statistics, we turn raw observations into meaningful insights without making conclusions beyond the data at hand.
Descriptive statistics are classified into five essential pillars:
To perform accurate descriptive analysis, we will follow this logical order:
The Measure of Central Tendency acts as the backbone of descriptive statistics. It provides a single value that represents the “typical” or “middle” point of a dataset.
The following seven elements are the foundation of central tendency and basic data summary:
In statistics, the “average” is more complex than a single calculation. There are more than seven types of means used depending on the data type:
Arithmetic Mean: \[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}\]
Geometric Mean: \[\bar{x}_{geom} = \sqrt[n]{x_1 \cdot x_2 \cdot ... \cdot x_n}\]
Harmonic Mean: \[\bar{x}_{harm} = \frac{n}{\sum_{i=1}^{n} \frac{1}{x_i}}\]
Below is a demonstration of how to generate descriptive statistics and visualizations using a sample dataset.
# Define the dataset
x <- c(11, 12, 11, 14, 11, 13, 14, 16, 17, 11, 11)
# Summary statistics (Min, 1st Qu., Median, Mean, 3rd Qu., Max)
summary(x)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 11.00 11.00 12.00 12.82 14.00 17.00
# Visualizing the distribution
par(mfrow=c(2,2)) # Arrange plots in a 2x2 grid
# 1. Boxplot to see outliers and quartiles
boxplot(x, main="Boxplot of x", col="orange", horizontal=TRUE)
# 2. Histogram to see frequency
hist(x, main="Histogram of x", col="skyblue", border="white")
# 3. Density plot to see the shape
plot(density(x), main="Density Plot", lwd=2, col="red")
# 4. Basic plot of data points
plot(x, main="Data Points", pch=19, col="darkgreen")
```