8.1 - bias, precision, and MSE of point estimators
Definition of point estimator
Let \(\theta\) represent an unknown parameter governing properties of a parent population.
A point estimator\(\hat\theta\) is a statistic that represents a single estimated value of an unknown parameter given a sample drawn i.i.d. from the population.
Parameter and estimator examples
Example 1:
\(\theta = \mu = E(Y)\), the population mean.
A good point estimator might be \(\hat\theta = \bar Y = \frac{\sum_{i=1}^n Y_i }{n}\).
Example 2:
\(\theta = \mu^2\), the squared mean of a population
Possible estimator: \(\hat\theta_1 = \bar Y^2\).
Another possible estimator: \(\hat\theta_2 = \frac{\sum_{i=1}^n Y_i^2}{n}\)
Which one is better??
Example 3:
Let \(\theta = \sigma^2 = Var(Y)\), the population variance.
Point estimator #1: \(\hat\theta_1 = \hat\sigma^2_1 = \frac{\sum_{i} (Y_i-\bar Y)^2}{n}\)
Point estimator #2: \(\hat\theta_2 = \hat\sigma^2_2 = \frac{\sum_{i} (Y_i-\bar Y)^2}{n-1}\)
Most statistical software calculates \(\hat\sigma^2_2\): WHY?
Properties of point estimators
So what makes an estimator “good” or “bad”?
To get at this question, we need to define some properties.
Comparing these properties for different estimators allows statisticians to determine which are best for estimating a given \(\theta\).
All of these properties consider behaviors of an estimator \(\hat\theta\)across repeated samples, that is, with respect to its sampling distribution.
Bias
The bias of a point estimator \(\hat\theta\) of \(\theta\) is
If \(B(\hat\theta) = 0\), which is the same thing as saying \(E(\hat\theta) = \theta\), then \(\hat\theta\) is said to be an unbiased estimator of \(\theta\).
Unbiasedness is obviously a desirable property: on average across repeated sampling, \(\hat\theta\) will equal the unknown parameter \(\theta\).
Variance
One could have a very unbiased estimator that is highly variable.
Precision is also a desirable property of an estimator, one that is measured by its variance: the smaller the variance of \(\hat\theta\), the greater the precision.
The variance of an estimator is defined to be \(Var(\hat\theta)\).
The standard error of an estimator is simply the standard deviation of the estimator:
\[SE(\hat\theta) = \sqrt{Var(\hat\theta)}\]
Note that both \(Var(\hat\theta)\) and \(SE(\hat\theta)\) might also depend on unknown parameters.
The simulated biases above can be put into perspective by plotting them as a function of \(n\) for each \(\sigma\):
ggplot(data = sample_variance_biases,aes(x=n))+geom_point(aes(y = bias_estimator1, color='n denominator')) +geom_line(aes(y = bias_estimator1, color='n denominator')) +geom_point(aes(y = bias_estimator2, color='n-1 denominator')) +geom_line(aes(y = bias_estimator2, color='n-1 denominator')) +geom_hline(yintercept=0) +facet_wrap(~true_sigma2, labeller =label_bquote(sigma^2== .(true_sigma2)))+labs(y='Bias',x='Sample size (n)',color='',title=expression(paste('Simulated bias of two different estimators of ', sigma^2))) +theme_classic()
The \(n-1\) estimator appears unbiased for all \(n\), and for all \(\sigma^2\).
The \(n\) estimator is negatively biased (too small) for small \(n\), but this bias goes away as \(n\) gets large.
The bias of the \(n\) estimator for small \(n\) is more severe for larger values of \(\sigma^2\).
General notes on simulating estimator properties
Finding bias/variance/MSE (and later CI coverage) of estimators requires aggregating (aka “summarizing”) the simulation studies, often by each parameter in the grid. This is a new step for us.
Because it’s a simulation study, theoretically unbiased estimators may often have non-zero simulated biases. This is due to the random nature of simulation studies. One way to tell if a simulated bias is “real” or “random”:
Up the number of replications (say from \(N=10,000\) to \(N=100,000\) - what happens if we do that in the previous code?)
Consider whether/how the simulated bias changes as \(n\) (sample size) increases - is it changing systematically (evidence of real bias) or “jumping around” from one small number to another, especially from small negative to small positive (evidence of random simulation error)?
Relative vs absolute bias
In the previous example, it’s tempting to look at the plot and think “the bias of the \(n\) estimator isn’t that bad if \(\sigma^2\) is small.” This would be misleading.
Often the bias of an estimator depends on the size of the parameter it’s trying to estimate.
This motivates an alternative way to measure bias.
Consider which is worse for an arbitrary example trying to estimate a parameter \(\theta\) using an estimator \(\hat\theta\):
\(\hat p_1\) is unbiased always; \(\hat p_2\) has some bias if \(p \ne 0.5\).
MSEs are difficult to compare.
Plotting the MSEs
Although \(\hat p_2\) is biased when \(p\ne 0.5\), except for extreme values of \(p\) near 0 or 1, it appears to have the better MSE
As \(n\) increases the discrepancies between the estimators lessen
Relative efficiency
If two estimators are both unbiased, then we can compare how “efficiently” they use the information in the sample by taking the ratio of their variances.
Definition: The efficiency of \(\hat\theta_1\) relative to \(\hat\theta_2\), defined as \(RE(\hat\theta_1,\hat\theta_2)\), is: