2025-02-03

Introduction

  • Point Estimation: Using sample data to estimate population parameters.
  • Confidence Intervals (CI): A range of values that likely contains the true population parameter.
  • Example: Estimating the mean miles per gallon (mpg) of cars using the mtcars dataset.

Point Estimation: Mean & Standard Deviation

data(mtcars)
mpg_mean <- mean(mtcars$mpg)
mpg_sd <- sd(mtcars$mpg)
n <- length(mtcars$mpg)

cat("Sample Mean:", mpg_mean, "\nSample Standard Deviation:", mpg_sd)
## Sample Mean: 20.09062 
## Sample Standard Deviation: 6.026948
  • Explanation: This is to calculate the sample mean and standard deviation for mpg in the mtcars dataset. These values help us understand the central tendency and variability in the data.

Confidence Interval Formula

\[ CI = \bar{x} \pm Z_{\alpha/2} \times \frac{s}{\sqrt{n}} \]

Where: - \(\bar{x}\) = sample mean - \(s\) = sample standard deviation - \(n\) = sample size - \(Z_{\alpha/2}\) = critical value for confidence level

Margin of Error Calculation

\[ MOE = Z_{\alpha/2} \times \frac{s}{\sqrt{n}} \]

  • Margin of Error (MOE) helps define the range of confidence.
  • A larger sample size results in a smaller MOE, making the estimate more precise.

Compute 95% Confidence Interval

alpha <- 0.05
z_value <- qnorm(1 - alpha/2)
margin_of_error <- z_value * (mpg_sd / sqrt(n))

lower_bound <- mpg_mean - margin_of_error
upper_bound <- mpg_mean + margin_of_error

cat("95% Confidence Interval: (", lower_bound, ",", upper_bound, ")")
## 95% Confidence Interval: ( 18.00243 , 22.17882 )
  • Explanation: This is to calculate a 95% confidence interval for the mpg dataset. This provides a range where the true population mean is likely to be.

Visualization with ggplot2

- Explanation: A histogram and a boxplot of mpg values with confidence interval markers. A visual representation makes it easier to understand the spread and confidence range.

Visualization with ggplot2 Part 2

3D Visualization with Plotly

  • Explanation: A 3D surface plot showing the probability distribution. This helps visualize the relationship between mpg values and probability density.

Conclusion

  • Point Estimation provides a best guess of a population parameter.
  • Confidence Intervals give a range of likely values.
  • Larger samples result in narrower intervals, increasing accuracy.

References

  • R Documentation: mtcars dataset
  • ggplot2 & plotly for visualizations