Data from the CPSch3 data set provided by Ecdat.
Some definitions
Point estimate - A single number used as the best guess estimate over a population.
Interval Estimation - The range in which a value is most likely to fall under.
2024-09-18
Data from the CPSch3 data set provided by Ecdat.
Point estimate - A single number used as the best guess estimate over a population.
Interval Estimation - The range in which a value is most likely to fall under.
The point estimate and interval go hand in hand. Even though the mean for a parameter might be 50 units, some values may be 0 and some may be 100.
The formula for confidence intervals is
\[ CI = x \pm z \frac{s}{\sqrt n} \] Where \(x\) is the sample mean
\(z\) is the confidence level value
\(s\) is the sample standard deviation
and \(n\) is the sample size
Below is a distribution of average hourly income rounded to the nearest cent in 1992.
Mean income is 16.2627394
Margin of error is 0.0671312
Therefore a 95% confidence interval is 16.1311504, 16.3943285
n <- nrow(CPSch3) meanAHE <- mean(CPSch3$rounded_ahe) marginErrorAHE <- sd(CPSch3$rounded_ahe) / sqrt(n) t <- qt(0.975, df = nrow(CPSch3) - 1) ciAHE <- meanAHE + c(-1, 1) * t * marginErrorAHE
We will display our calculated confidence interval using a ggplot2 histogram.
ggplot(data = CPSch3, aes(x = rounded_ahe)) +
geom_histogram(binwidth = 1, fill = "maroon") +
geom_ribbon(aes(ymin = 0, ymax = Inf,
xmin = ciAHE[1],
xmax = ciAHE[2]),
fill = "gray80", alpha = 0.5) +
ggtitle("Average Hourly Income with 95% Confidence Interval") +
xlab("Average Hourly Income") + ylab("Frequency")
The distribution as represented by a density plot instead of a histogram