2024-03-20

Introduction

  • Point estimation involves estimating an unknown parameter of a population based on sample data.

  • A point estimate is a single value that best represents the population parameter.

  • The choice of estimator depends on the nature of the parameter being estimated and the characteristics of the sample data.

  • Point estimation is a fundamental concept in statistics used to make inferences about population parameters based on sample data.

  • Understanding point estimation is essential for making informed decisions and drawing reliable conclusions from statistical analyses.

Point Estimators

  • Common Point Estimators: Common point estimators include sample mean, sample proportion, and sample median, which provide single-value estimates for population parameters.

  • Bayesian Point Estimators: Bayesian point estimators, such as posterior mean, posterior median, and Maximum a Posteriori (MAP), incorporate prior beliefs and observed data to estimate parameters using Bayesian inference.

Types of Common Point Estimators

  • Sample Mean \(\bar{x}\):

  • Estimation of population mean.

  • Example: Average height of students in a classroom.

  • Sample Proportion \(\hat{p}\):

  • Estimation of population proportion.

  • Example: Proportion of defective items in a batch.

  • Sample Median \(\tilde{x}\):

  • Estimation of population median.

  • Example: Median income of households in a city.

Bayesian Point Estimators

  • Posterior Mean: Represents the expected value of the parameter given the observed data.

  • Minimizes posterior risk for squared-error loss function.

  • Posterior Median: Represents the middle value of the parameter distribution given the observed data.

  • Minimizes posterior risk for absolute-value loss function.

  • Maximum a Posteriori (MAP): Finds the mode of the posterior distribution.

  • Often used when a uniform prior probability is assumed.

Properties of Point Estimators

  • Unbiasedness: A point estimator is unbiased if, on average, it equals the parameter being estimated.

  • Efficiency: An efficient estimator has the smallest possible variance among all unbiased estimators.

  • Consistency: A consistent estimator converges to the true parameter value as the sample size increases.

  • Robustness: A robust estimator is minimally affected by outliers or violations of distributional assumptions.

Confidence Intervals

  • Confidence intervals provide a range of plausible values for the population parameter.

  • They are constructed around the point estimate and indicate the uncertainty associated with the estimate.

  • A 95% confidence interval means that in repeated sampling, 95% of the intervals constructed will contain the true parameter value.

Mathematical Formula for Confidence Interval

Given a sample of size \(n\), with sample mean \(\bar{x}\) and standard deviation \(s\), the formula for constructing a confidence interval for the population mean \(\mu\) is:

\[ \bar{x} \pm t_{\alpha/2} \times \frac{s}{\sqrt{n}} \]

where:

  • \(t_{\alpha/2}\) is the critical value from the t-distribution corresponding to the desired confidence level \(\alpha\) and degrees of freedom (\(n\) - 1).

Example: Confidence Interval for Population Mean

Suppose we want to estimate the average height of students in a classroom.

  • We take a random sample of 50 students and measure their heights.

  • The sample mean height is 170 cm, and the standard deviation is 10 cm.

  • Using a t-distribution with 49 degrees of freedom, we construct a 95% confidence interval.

R Code: Constructing Confidence Interval

heights <- rnorm(50, mean = 170, sd = 10)

# Calculate sample mean and standard deviation
sample_mean <- mean(heights)
sample_sd <- sd(heights)

# Construct 95% confidence interval
conf_interval <- t.test(heights)$conf.int
conf_interval
## [1] 164.2038 169.8855
## attr(,"conf.level")
## [1] 0.95

The 95% confidence interval for the population mean height is (165.21 cm, 174.79 cm). This means we are 95% confident that the true average height of students in the classroom falls within this interval.

Sample Mean \(\bar{x}\)

  • Average number of people eating different fruits.

Sample Proportion \(\hat{p}\)

Proportion of people eating different fruits.

Sample Median \(\tilde{x}\)

Median number of people eating different fruits.

Calulating Posterior Parameters

alpha <- 2
beta <- 2

# Generate likelihood data
set.seed(42) # for reproducibility
obs_successes <- rbinom(100, size = 10, prob = 0.6) 

# Calculate posterior parameters
posterior_alpha <- alpha + sum(obs_successes)
posterior_beta <- beta + 10 * length(obs_successes) - sum(obs_successes)

# Calculate posterior mean, median, and MAP
posterior_mean <- posterior_alpha / (posterior_alpha + posterior_beta)
posterior_median <- qbeta(0.5, posterior_alpha, posterior_beta)
posterior_map <- (posterior_alpha - 1) / (posterior_alpha + posterior_beta - 2)

Plotting

Conclusion

  • Point estimation is a crucial concept in statistics for estimating population parameters based on sample data.

  • Common point estimators like sample mean, proportion, and median provide single-value estimates, while Bayesian estimators incorporate prior beliefs and observed data.

  • Confidence intervals quantify the uncertainty associated with point estimates, providing a range of plausible values for the population parameter.