Lecture 3: Measures of Dispersion

POLS3316, Instructor: Tom Hanna, Spring 2025, University of Houston

2026-01-31

Last class

  • Measures of Central Tendency

      - Mean
      - Median
      - Mode
  • These tell us about the center of the data

  • They help us start to the visualize the data based on the typical or average value

Today

  • Measures of Dispersion (Variation or Spread)

      - Variance
      - Standard Deviation
      - Range
      - Quartiles

We start with the mean

  • Start with the mean
  • Our variable X: 1,7,21,13,19,5,9,17,11
  • Mean is 11.44

Visualizing distances from the mean

  • Plot the data points

Same Mean, Different Spread

  • Now, consider these two sets of data points with the same mean but different spreads:

Drawing on the board time!

Creating a measure of dispersion: distance to mean

  • So, we could define a measure of dispersion or variation that is the total length of the colored lines.
  • Our formula in English would be “the sum of the differences between each observation and the mean”

Drawing on the board time!

Problem with sum of distances

The problem is that because of the definition of mean, the positive lines will cancel out the negative and the dispersion or variation would always be zero!

Simple Data Example

Suppose we had a very simple data set with only two observations - 5 and 15. The mean is 10. One is 5 above the mean and one is 5 below the mean.

Distance from Mean Total

So, we want our new measure total_variation to equal the sum of the distances.

Math to the rescure!

Math comes to the rescue!

  • What is something we can do that turns a negative number into a positive number every time and leaves a positive number as a positive?
  • It’s also important that any effect it has on the actual size of the numbers is consistent between positive and negative numbers.

Math to the rescure! Code

  • We can square the distances

Results

  • Squaring 5 turned it into 25
  • Squaring -5, which is the same size but negative, also turned it into 25.
  • So, now we can add them to get a measure of total_squared_variation.

``

Are we done?

  • Suppose we had 1000 observations
  • Mean still 10
  • Each still 5 points away on average
  • What would our total variation be?

Given that the actual average distances is exactly the same for both groups, does that make sense? Is it useful?

Solution: Average Squared Difference - Variance

  • We want the average of the distances or

  • Average of the squared differences.

  • So our measure of variance is in the simplest form:

\[ s^2 = \frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{x})^2 \]

Variance formula

\[ s^2 = \frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{x})^2 \]

s^2 = variance n = number of observations xi = each observation x̄ = mean of the observations

Problem: Squares inflate the results

  • Squares inflate the numbers relative to the size of the mean.
  • 25 is 2.5 times the mean.
  • But the distances aren’t really that big
  • Average distance is still 5
  • We want to get back to the original unit of measure instead of the squared unit of measure…

Solution

How can we solve this?

  • To partially account for this we can take the square root of the variance
  • That gives us our next measure: standard deviation

Standard deviation

  • standard deviation is the square root of the variance

\[ s = \sqrt{s^2} \]

Standard deviation: full formula

\[ s = \sqrt{\frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{x})^2} \]

Sample vs Population

  • When we have data for the entire population, we can compute the true variance and standard deviation directly, so we divide by n
  • When we only have a sample, this is systematically too small, underestimating the population spread.
  • Dividing by n−1 (Bessel’s correction) adjusts for this and makes the sample variance an unbiased estimator of the population variance

Population Variance

\[ \sigma^2 = \frac{1}{n} \sum_{i=1}^{n} (x_i - \mu)^2 \]

Population Standard Deviation

\[ \sigma = \sqrt{\frac{1}{n} \sum_{i=1}^{ n} (x_i - \mu)^2} \]

or

\[ \sigma = \sqrt{\sigma^2} \]

Quartiles

  • Quartiles divide the data into four equal parts
  • The first quartile (Q1) is the 25th percentile
  • The second quartile (Q2) is the 50th percentile (the median)
  • The third quartile (Q3) is the 75th percentile
  • The interquartile range (IQR) is the difference between Q3 and Q1 (IQR = Q3 - Q1)

Range

  • The range is the difference between the maximum and minimum values in the data set
  • Range = Max - Min

Authorship and License