2026-04-13

What Is Descriptive Statistics?

Descriptive statistics summarize and describe the main features of a dataset.

  • Center: mean, median
  • Spread: variance, standard deviation, range
  • Shape: skewness

\[\bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_i \qquad s^2 = \frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2\]

\[s = \sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2}\]

The Iris Dataset

The iris dataset contains measurements (in cm) from 150 flowers across 3 species.

data(iris)
summary(iris[, 1:4])
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500

Distribution of Petal Length

Comparing Spread Across Species

Summary Statistics by Species

iris %>%
  group_by(Species) %>%
  summarise(
    Mean   = round(mean(Petal.Length), 1),
    Median = round(median(Petal.Length), 1),
    SD     = round(sd(Petal.Length), 1),
    Min    = min(Petal.Length),
    Max    = max(Petal.Length)
  )
## # A tibble: 3 × 6
##   Species     Mean Median    SD   Min   Max
##   <fct>      <dbl>  <dbl> <dbl> <dbl> <dbl>
## 1 setosa       1.5    1.5   0.2   1     1.9
## 2 versicolor   4.3    4.3   0.5   3     5.1
## 3 virginica    5.6    5.6   0.6   4.5   6.9

Boxplot of Iris Measurements

Why Spread Matters

A small SD means values cluster tightly around the mean:

\[s = \sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2}\]

Species Mean SD
setosa 1.46 0.17
versicolor 4.26 0.47
virginica 5.55 0.55

Takeaways

  • Descriptive statistics give a quick summary of any dataset
  • The mean and median describe center, while standard deviation describes spread
  • The three iris species differ clearly in petal size, both in center and spread