Point Estimation

2024-03-20

Introduction

Point estimation involves estimating an unknown parameter of a population based on sample data.
A point estimate is a single value that best represents the population parameter.
The choice of estimator depends on the nature of the parameter being estimated and the characteristics of the sample data.
Point estimation is a fundamental concept in statistics used to make inferences about population parameters based on sample data.
Understanding point estimation is essential for making informed decisions and drawing reliable conclusions from statistical analyses.

Point Estimators

Common Point Estimators: Common point estimators include sample mean, sample proportion, and sample median, which provide single-value estimates for population parameters.
Bayesian Point Estimators: Bayesian point estimators, such as posterior mean, posterior median, and Maximum a Posteriori (MAP), incorporate prior beliefs and observed data to estimate parameters using Bayesian inference.

Types of Common Point Estimators

Sample Mean \(\bar{x}\):
Estimation of population mean.
Example: Average height of students in a classroom.
Sample Proportion \(\hat{p}\):
Estimation of population proportion.
Example: Proportion of defective items in a batch.
Sample Median \(\tilde{x}\):
Estimation of population median.
Example: Median income of households in a city.

Bayesian Point Estimators

Posterior Mean: Represents the expected value of the parameter given the observed data.
Minimizes posterior risk for squared-error loss function.
Posterior Median: Represents the middle value of the parameter distribution given the observed data.
Minimizes posterior risk for absolute-value loss function.
Maximum a Posteriori (MAP): Finds the mode of the posterior distribution.
Often used when a uniform prior probability is assumed.

Properties of Point Estimators

Unbiasedness: A point estimator is unbiased if, on average, it equals the parameter being estimated.
Efficiency: An efficient estimator has the smallest possible variance among all unbiased estimators.
Consistency: A consistent estimator converges to the true parameter value as the sample size increases.
Robustness: A robust estimator is minimally affected by outliers or violations of distributional assumptions.

Confidence Intervals

Confidence intervals provide a range of plausible values for the population parameter.
They are constructed around the point estimate and indicate the uncertainty associated with the estimate.
A 95% confidence interval means that in repeated sampling, 95% of the intervals constructed will contain the true parameter value.

Mathematical Formula for Confidence Interval

Given a sample of size \(n\), with sample mean \(\bar{x}\) and standard deviation \(s\), the formula for constructing a confidence interval for the population mean \(\mu\) is:

\[ \bar{x} \pm t_{\alpha/2} \times \frac{s}{\sqrt{n}} \]

where:

\(t_{\alpha/2}\) is the critical value from the t-distribution corresponding to the desired confidence level \(\alpha\) and degrees of freedom (\(n\) - 1).

Example: Confidence Interval for Population Mean

Suppose we want to estimate the average height of students in a classroom.

We take a random sample of 50 students and measure their heights.
The sample mean height is 170 cm, and the standard deviation is 10 cm.
Using a t-distribution with 49 degrees of freedom, we construct a 95% confidence interval.

R Code: Constructing Confidence Interval

heights <- rnorm(50, mean = 170, sd = 10)

# Calculate sample mean and standard deviation
sample_mean <- mean(heights)
sample_sd <- sd(heights)

# Construct 95% confidence interval
conf_interval <- t.test(heights)$conf.int
conf_interval

## [1] 164.2038 169.8855
## attr(,"conf.level")
## [1] 0.95

The 95% confidence interval for the population mean height is (165.21 cm, 174.79 cm). This means we are 95% confident that the true average height of students in the classroom falls within this interval.

Sample Mean \(\bar{x}\)

Average number of people eating different fruits.

Sample Proportion \(\hat{p}\)

Proportion of people eating different fruits.

Sample Median \(\tilde{x}\)

Median number of people eating different fruits.

Calulating Posterior Parameters

alpha <- 2
beta <- 2

# Generate likelihood data
set.seed(42) # for reproducibility
obs_successes <- rbinom(100, size = 10, prob = 0.6) 

# Calculate posterior parameters
posterior_alpha <- alpha + sum(obs_successes)
posterior_beta <- beta + 10 * length(obs_successes) - sum(obs_successes)

# Calculate posterior mean, median, and MAP
posterior_mean <- posterior_alpha / (posterior_alpha + posterior_beta)
posterior_median <- qbeta(0.5, posterior_alpha, posterior_beta)
posterior_map <- (posterior_alpha - 1) / (posterior_alpha + posterior_beta - 2)

Plotting

Conclusion

Point estimation is a crucial concept in statistics for estimating population parameters based on sample data.
Common point estimators like sample mean, proportion, and median provide single-value estimates, while Bayesian estimators incorporate prior beliefs and observed data.
Confidence intervals quantify the uncertainty associated with point estimates, providing a range of plausible values for the population parameter.