What is Point Estimation?

Point estimation is a method in inferential statistics used to estimate an unknown population parameter as a single numerical value.

  • We observe a sample from a population
  • We use a statistic calculated from the sample is used to guess the population parameter
  • Common parameters we estimate include mean \(\mu\), variance \(\sigma^2\), proportion \(p\)

The goal is to find an estimator that gives accurate results and is consistent.

Key Mathematical Definitions

Let \(X_1, X_2, \ldots, X_n\) be a random sample from a population with unknown parameter \(\theta\).

A point estimator \(\hat{\theta}\) is a function of the sample:

\[\hat{\theta} = g(X_1, X_2, \ldots, X_n)\]

Bias of an estimator is defined as:

\[\text{Bias}(\hat{\theta}) = E(\hat{\theta}) - \theta\]

An estimator is called unbiased if:

\[E(\hat{\theta}) = \theta \quad \Longrightarrow \quad \text{Bias}(\hat{\theta}) = 0\]

Properties of Good Estimators

A good point estimator should satisfy:

  1. Unbiasedness (\(E(\hat{\theta}) = \theta\)): the estimator gives the correct value of the parameter on average
  2. Efficiency: Estimator with the smallest variance among all unbiased estimators
  3. Consistency: as \(n \to \infty\), \(\hat{\theta} \to \theta\) in probability
  4. Sufficiency: uses all information available in the sample about \(\theta\)

The Mean Squared Error (MSE) combines bias and variance:

\[\text{MSE}(\hat{\theta}) = \text{Var}(\hat{\theta}) + [\text{Bias}(\hat{\theta})]^2\]

Sampling Distribution of \(\bar{X}\)

By the Central Limit Theorem, \(\bar{X} \sim N\!\left(\mu,\, \frac{\sigma^2}{n}\right)\) for large \(n\).

Bias vs. Variance Trade-off

As \(n\) grows, both estimators converge, but the unbiased version is normally preferred.

Maximum Likelihood Estimation (MLE)

MLE is a statistical method to find the best point estimator.

Given data \(x_1, \ldots, x_n\) and a model with parameter \(\theta\), the likelihood function is:

\[L(\theta) = \prod_{i=1}^n f(x_i \mid \theta)\]

We choose \(\hat{\theta}_{MLE}\) that maximizes \(L(\theta)\). It is often easier to maximize the log-likelihood:

\[\ell(\theta) = \log L(\theta) = \sum_{i=1}^n \log f(x_i \mid \theta)\]

Example: For a Normal distribution, the MLE of \(\mu\) is \(\bar{X}\), and the MLE of \(\sigma^2\) is \(\frac{1}{n}\sum(X_i - \bar{X})^2\) (note: biased!).

Sample Mean Convergence vs. Sample Size

R Code: Sampling Distribution Simulation

set.seed(42)

# Simulate a large population 
population <- rnorm(10000, mean = 170, sd = 10)

# Draw 1000 samples of size 30
sample_means <- replicate(1000, mean(sample(population, 30)))

# Plot the sampling distribution
library(ggplot2)
ggplot(data.frame(x = sample_means), aes(x = x)) +
  geom_histogram(aes(y = after_stat(density)), bins = 40,
                 fill = "#8C1D40", color = "white", alpha = 0.85) +
  geom_density(color = "#FFB310", linewidth = 1.2) +
  geom_vline(xintercept = 170, linetype = "dashed") +
  labs(title = "Sampling Distribution of the Sample Mean",
       x = expression(bar(X)), y = "Density") +
  theme_minimal()