For the answer, we’ll need to dig deeper into how we figure out the properties of populations from samples of their members.
Population Means and Sample Means
The expectation value for a discrete random variable \(X\) is \(E[X] = \sum_{i}{P(x_i)x_i}\), where \(P(x_i)\) is the probability that \(X\) takes the value \(x_i\), and \(i\) covers all the possible values of \(X\). There is also an integral version for continuous variables. This expectation value gets a special symbol \(\mu\) and is called the population mean.
If we take \(n\) samples \(x_i\) from \(X\), the average of the samples is \(\frac{1}{n}\sum_{i=1}^nx_i\). We call this the sample mean and give it the symbol \(\bar{x}\).
Unbiased Estimators
Does calculating the sample mean \(\bar{x}\) for some sample provide a good estimation of the population mean \(\mu\)?
To answer this important question3, we’ll need to say precisely what we mean when we say a calculation - like taking the average \(\bar{x}\) of some samples - is a “good” estimator for a parameter of the entire possible population like \(\mu\).
Let’s say we want our estimators to be unbiased, which we define to mean that the expectation value of the estimator gives the correct value for the population parameter.45
Is the Sample Mean an Unbiased Estimator for the Population Mean?
We’ll use our definition and compute the expectation value of the averages of sets \(X\) each of \(n\) samples \(x_1 ... x_n\).
Each \(x_i\) is a realization of a random variable from a distribution with a population mean of \(\mu\), so \(E[x_i]=\mu\).
As hoped, we’ve shown that a sample mean is an unbiased estimator for the population mean:
\[
E[\bar{X}]=\mu
\]
Demonstration That the Sample Mean Is an Unbiased Estimator for the Population Mean
Let’s give this a try on a common distribution: the exponential distribution.
library(ggplot2)df <-data.frame(X =1:10, Y =1:10)g <-ggplot(df, aes(x = X, y = Y))g <- g +xlim(0, 4)g <- g +stat_function(fun = dexp, n =101, args =list(rate =1))g
\(\mu=\frac{1}{rate}\) for the exponential function, so let’s confirm that taking the average of the average of lots of n-fold samples gives us progressively better estimates of \(\mu\):
The population variance of a discrete distribution is \(\sigma^2=\frac{1}{n}\sum_{i=1}^{n}(x_i-\mu)^2\) , so let’s start with that as our candidate estimator for the population variance \(\sigma^2\).
Let’s now turn to understanding the variance of the sample mean.
\[
var(\bar{X}) = E[\bar{X}^2]-(E[\bar{X}])^2
\]
We already demonstrated that \(E[\bar{X}]=\mu\) when we showed that the sample average is an unbiased estimator for the population mean. And by the definition of standard error6\(\sigma_{\bar{X}}^2=\frac{\sigma^2}{n}\), so:
\[
\frac{\sigma^2}{n}=E[\bar{X}^2]-\mu^2
\]
or:
\[
E[\bar{X^2}]=\frac{\sigma^2}{n}+\mu^2
\]
Substituting these two expressions into the earlier formula for the expectation of our candidate estimator, we get:
This tells us our candidate estimator for the variance is not unbiased, because it systematically underestimates the actual variance \(\sigma^2\) by a factor of \(\frac{n-1}{n}\).
On the other hand, an appropriately scaled version of our candidate estimator is unbiased:
The moral of the story is that the formula for the sample variance must have \(n-1\) in its denominator so it can be an unbiased estimator for the population variance.
Say we already know the population mean \(\mu\) and don’t need to compute the sample mean \(\bar{x}\) as an intermediate step in our estimation process. Does this affect the choice of estimator?
Let’s start with the same candidate estimator we used before, but we’ll substitute the known \(\mu\) for the calculated \(\bar{x}\).
Conclusion: if the population mean is already known, \(\frac{1}{n}\sum_i(x_i-\mu)^2\) (denominator = \(n\), not \(n-1\)) is an unbiased estimator for the population variance.
This example illustrates how degrees-of-freedom (DF) impact statistical calculations. When we already know the population mean \(\mu\), the \(n\) samples constitute \(n\) independent variables, \(x_1 ... x_n\). If we need to compute the sample mean as an intermediate result, the \(n\) samples only account for \(n-1\) independent variables (because if we are given \(x_1 ... x_{n-1}\) and \(\bar{x}=\frac{1}{n}\sum{x_i}\) we can compute the final \(x_n\), so \(x_n\) is no longer an independent variable). I’ll write more about degrees-of-freedom in another article.
Footnotes
Except for this footnote, we’re going to skip by the question of what a random variable actually is. Let’s go with this: a random variable is a function that returns values from a set - the sample space - at a rate based on probabilities associated with the members of the set. The expectation value of a random variable is an idealized average value based on invoking the random variable function infinitely many times.↩︎
Here we’re relying heavily on the linearity of expectations: \(E(A+B) = E(A) + E(B)\), and \(E(cA) = cE(A)\). The behavior of expectations is part of the general topic of the algebra of random variables. https://en.wikipedia.org/wiki/Algebra_of_random_variables recounts some of the basics, without going into the underlying mathematical foundation. The references in the Wikipedia article, especially Peter Whittle’s book “Probability via Expectation”, are good places to start deeper study. (Whittle, Peter (2000). Probability via Expectation (4th ed.). Springer. 978-0-387-98955-6.)↩︎
You might say this is the most fundamental question in statistics because it goes to how we can compute something we can’t observe directly based on imperfect observations of things we can observe. The story of how investigators gradually learned that aggregate measures like averages were capable of pulling good answers out of messy data is fascinating. If you’re interested, I recommend Stigler, Stephen (2016), The Seven Pillars of Statistical Wisdom. Harvard. 978-0-674-08891-7.↩︎
Note how goodness is a property of the process, not the individual estimates. When we refer to an “unbiased estimator”, we’re saying that a process of estimation produces statistically unbiased results over time, not that any particular estimate is good. This practice of focusing on the behaviors of measurement processes repeated over time is central to the frequentist approach to statistics.↩︎
Being unbiased is only one characteristic an estimator might display, and it’s not necessarily the most valuable. Other characteristics of estimators we might consider include consistency, accuracy and efficiency (see https://en.wikipedia.org/wiki/Estimator, and especially the cited references, to learn more).↩︎