Measuring Spread: Variance, Standard Deviation, and the Normal Curve

11/9/2025

Why Measure Spread?

The mean describes the center of a dataset
But it doesn’t tell us the complete story
To understand the dataset, it’s critical to understand variability in data
- How spread out are the values?
- How much do individual observations differ from the mean?
The answers to these questions can demonstrate:
- The reliability of the mean
- The predictability of future measurements
- Uncertainty in the data that must be accounted for

The Normal Distribution

Symmetric, bell-shaped curve where mean = median = mode
Defined by mean (μ) and standard deviation (σ)
Foundation for inferential statistics and hypothesis testing

Variance: Mathematical Definition

Population Variance:

\[\sigma^2 = \frac{\sum_{i=1}^{N}(x_i - \mu)^2}{N}\]

Sample Variance:

\[s^2 = \frac{\sum_{i=1}^{n}(x_i - \bar{x})^2}{n-1}\]

\(x_i\) = individual observations
\(\mu\) (or \(\bar{x}\)) = population mean (or sample mean)
\(N\) (or \(n\)) = population size (or sample size)

Definition: Average of squared deviations from the mean

Standard Deviation: Mathematical Definition

Standard Deviation is the square root of variance:

\[\sigma = \sqrt{\sigma^2} = \sqrt{\frac{\sum_{i=1}^{N}(x_i - \mu)^2}{N}}\]

\[s = \sqrt{s^2} = \sqrt{\frac{\sum_{i=1}^{n}(x_i - \bar{x})^2}{n-1}}\]

Why take the square root?

Variance is in squared units, whereas standard deviation returns to original units
SD is easier to understand and interpret

Visualizing Different Spreads Code

##create x axis
x <- seq(-10, 10, length.out = 1000)


##create a data frame that illustrates SD change in normal curve
df <- data.frame(x = rep(x, 3),
  y = c(dnorm(x, mean = 0, sd = 1),
        dnorm(x, mean = 0, sd = 2),
        dnorm(x, mean = 0, sd = 3)),
  Distribution = rep(c("SD = 1 (Low Variance)", 
                       "SD = 2 (Medium Variance)", 
                       "SD = 3 (High Variance)"), 
                     each = length(x)))


##plot each curve with ggplot
ggplot(df, aes(x = x, y = y, color = Distribution)) +
  geom_line(size = 1.2) +
  labs(title = "Normal Curves with Different Standard Deviations",
       x = "Value",
       y = "Probability Density") +
  scale_color_manual(values = c("mediumaquamarine", "thistle3", "palevioletred")) +
  theme_minimal(base_size = 14) +
  theme(plot.title = element_text(hjust = 0.5, face = "bold"),
        legend.position = "bottom")

Visualizing Different Spreads Plot

Larger variance = wider, flatter curve

The Empirical Rule (68-95-99.7)

68% of data within 1 standard deviation
95% of data within 2 standard deviations
99.7% of data within 3 standard deviations

Spread is critical for understanding statistical inference.

Variance and Standard Deviation both measure spread
Standard Deviation is more interpretable
Normal curve provides framework for understanding spread
Empirical Rule: 68-95-99.7 percent of observations within 1-2-3 standard deviations

Concept	Formula	Units	Interpretation
Variance	\(s^2 = \frac{\sum(x_i - \bar{x})^2}{n-1}\)	Squared	Average squared deviation
Std Dev	\(s = \sqrt{s^2}\)	Original	Typical deviation from mean