2024-09-18

Introduction

Data from the CPSch3 data set provided by Ecdat.

Some definitions

Point estimate - A single number used as the best guess estimate over a population.

Interval Estimation - The range in which a value is most likely to fall under.

Usage

The point estimate and interval go hand in hand. Even though the mean for a parameter might be 50 units, some values may be 0 and some may be 100.

The formula for confidence intervals is

\[ CI = x \pm z \frac{s}{\sqrt n} \] Where \(x\) is the sample mean

\(z\) is the confidence level value

\(s\) is the sample standard deviation

and \(n\) is the sample size

Estimate Mean Income

Below is a distribution of average hourly income rounded to the nearest cent in 1992.

Calculating The Interval

  • Mean income is 16.2627394

  • Margin of error is 0.0671312

  • Therefore a 95% confidence interval is 16.1311504, 16.3943285

  n <- nrow(CPSch3)
  meanAHE <- mean(CPSch3$rounded_ahe)
  marginErrorAHE <- sd(CPSch3$rounded_ahe) / sqrt(n)
  t <- qt(0.975, df = nrow(CPSch3) - 1)
  ciAHE <- meanAHE + c(-1, 1) * t * marginErrorAHE

Plotting The Confidence Interval

We will display our calculated confidence interval using a ggplot2 histogram.

ggplot(data = CPSch3, aes(x = rounded_ahe)) +
  geom_histogram(binwidth = 1, fill = "maroon") +
  geom_ribbon(aes(ymin = 0, ymax = Inf, 
                  xmin = ciAHE[1], 
                  xmax = ciAHE[2]),
              fill = "gray80", alpha = 0.5) +
  ggtitle("Average Hourly Income with 95% Confidence Interval") +
  xlab("Average Hourly Income") + ylab("Frequency")

Confidence Interval Plot

Confidence Interval Plot

The distribution as represented by a density plot instead of a histogram