2025-10-22

What is point estimation

Point estimation uses sample data to compute a single number (a point estimate) for an unknown population parameter, as opposed to an interval that provides a range of plausible values. This presentation focuses on what makes good estimators and shows small simulations and visuals to help understand more.

Estimator properties

  • Unbiasedness: \(E(\hat{ \theta} )=\theta\) so the estimator is centered on the true parameter on average.
  • Consistency: \(\hat{\theta} \xrightarrow{p} \theta\) as sample size grows, so estimates concentrate around the truth.
  • Efficiency: among unbiased estimators, prefer the one with the smallest variance for a given model.
  • Sufficiency: statistics that capture all information about the parameter contained in the data.

Bias–variance–MSE

Mean squared error decomposes into variance plus squared bias: \(\mathrm{MSE}(\hat\theta)=\mathrm{Var}(\hat\theta)+\mathrm{Bias}(\hat\theta)^2\). This identity shows that slightly biased estimators with much smaller variance can have lower MSE than unbiased ones.

Cramér–Rao lower bound

For regular models and any unbiased estimator \(\hat\theta\), \(\mathrm{Var}(\hat\theta)\ge \dfrac{1}{I(\theta)}\), where \(I(\theta)\) is the Fisher information, giving a fundamental lower limit on achievable variance. Estimators that attain this bound are called efficient and represent the best unbiased precision in that setting.

Methods of estimation

  • Method of moments: choose parameter values so sample moments equal model-implied moments.
  • Maximum likelihood: choose parameters that maximize the likelihood of the observed data under the model.

Example: Bernoulli proportion

If \(X_1,\dots,X_n\overset{iid}{\sim}\mathrm{Bernoulli}(p)\), the sample proportion \(\hat p=\bar X\) is a point estimator of \(p\) and is unbiased with \(E[\hat p]=p\). Its variance is \(\mathrm{Var}(\hat p)=\dfrac{p(1-p)}{n}\), showing precision improves with larger \(n\).

Simulate sampling distributions of p-hat for different n

p <- 0.3
ns <- c(20, 50, 200)
B <- 5000
sim_df <- do.call(rbind, lapply(ns, function(n){
  phat <- replicate(B, mean(rbinom(n, size=1, prob=p)))
  data.frame(n=factor(n), phat=phat)
}))

Plot sampling distributions using ggplot (Page 1)

Plot sampling distributions using ggplot (Page 2)

The sampling distribution of \(\hat p\) tightens as \(n\) increases, consistent with \(\mathrm{Var}(\hat p)=\dfrac{p(1-p)}{n}\).

Empirical MSE vs n

Empirical MSE decreases with sample size, illustrating the bias–variance–MSE connection for \(\hat p\).

3D log-likelihood

For \(X_i\sim \mathcal{N}(\mu,\sigma)\), the log-likelihood surface over \((\mu,\sigma)\) visualizes where estimates maximize likelihood and how curvature reflects information.

3D log-likelihood surface for \(\mathcal{N}(\mu,\sigma)\)

The peak of the surface indicates the MLEs, and sharper curvature suggests larger Fisher information (higher precision).