Harold Nelson
2025-03-25
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Start with the fact that 95% of the probability of any normal distribution falls between \(\mu-1.96*\sigma\) and \(\mu+1.96*\sigma\). We can apply this fact to the sampling distributions from the previous module and obtain confidence intervals for the true mean and true proportion based on sample estimates and sample sizes. If \(\bar{X}\) is an estimate of the mean based on a sample of size \(n\) drawn from a population with known standard deviation \(\sigma\), a 95% confidence is given by
\[\bar{X}\pm 1.96*\frac{\sigma}{\sqrt{n}}\]
The following code snippet allows you to substitute values and obtain a confidence interval based on the central limit theorem. This code depends on two assumptions:
It assumes a default confidence level of 95%, but this can be changed.
# Code snippet to construct a confidence interval for a
# population mean given a known population standard deviation
# and a sample mean from a sample of size n
# Provide inout here.
##################################################################
xbar = 100 # Sample Mean
sd = 10 # Population Standard Deviation
n = 250 # Sample size
CL = .95 # Required Confidence Level
##################################################################
zstar <- qnorm(CL+.5*(1-CL)) # Obtain Z-score for this confidence level
sd.xbar <- sd/sqrt(n) # Compute standard error of sample mean
ME <- zstar * sd.xbar # Compute margin of error
lb <- xbar - ME # Compute lower bound of CI
ub <- xbar + ME # Compute upper bound of CI
CI <- c(CL,lb,xbar,ub,ME) # Put our results in a vector
# Name the vector elements
names(CI) <- c("Confidence Level","Lower Bound","Xbar","Upper Bound","Margin of Error") # Name the vector elements
CI # Display the vector
## Confidence Level Lower Bound Xbar Upper Bound
## 0.95000 98.76041 100.00000 101.23959
## Margin of Error
## 1.23959
It is important to describe the results of this process properly. Here is the usual correct statement using the concept of confidence interval. “We are 95% confident that the true population mean is between 98.76 and 101.24.”
Here is an alternative statement using the concept of margin of error. “We are 95% confident that the true population mean is within 1.24 of the estimated mean 100.00.”
Suppose we want to get a 90% confidence interval for the the mean of the variable mpg in the mtcars dataframe. The following code will work. Note that we can use R functions to obtain the inputs for the process directly in the code snippet.
# Provide input here.
##################################################################
xbar = mean(mtcars$mpg) # Sample Mean
sd = sd(mtcars$mpg) # Population Standard Deviation
n = length(mtcars$mpg) # Sample size
CL = .9 # Required Confidence Level
#################################################################
zstar <- qnorm(CL+.5*(1-CL)) # Obtain Z-score for this confidence level
sd.xbar <- sd/sqrt(n) # Compute standard of sample mean
ME <- zstar * sd.xbar # Compute margin of error
lb <- xbar - ME # Compute lower bound of CI
ub <- xbar + ME # Compute upper bound of CI
CI <- c(CL,lb,xbar,ub,ME) # Put our results in a vector
names(CI) <- c("Confidence Level","Lower Bound","Xbar","Upper Bound",
"Margin of Error") # Name the vector elements
CI # Display the vector
## Confidence Level Lower Bound Xbar Upper Bound
## 0.900000 18.338159 20.090625 21.843091
## Margin of Error
## 1.752466
We are 90% confident that the population mean lies between 18.34 and 21.84.
We can do the same kind of thing for confidence intervals for proportions.
The Sampling Distribution of the Proportion
We have the basic theoretical results.
The estimates of a proportion \(\hat{p}\) based on a sample of size n is approsimately normal and has the following mean and standard deviation provided that \(n\hat{p} > 10\) and \(n(1-\hat{p}) > 10\).
\[\mu_{\hat{p}} = p\] and \[\sigma_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}}\]
Here is a code snippet. You can replace the values in the first few lines and run the entire snippet.
# Code snippet to compute a confidence interval for a proportion
# Inputs
################################################################
phat <- .7 # Estimated proportion
CL <- .95 # Required confidence level
n <- 100 # Sample size
################################################################
zstar <- qnorm(CL+.5*(1-CL))
se.phat <-sqrt(phat*(1-phat)/n)
ME = zstar * se.phat
lb <- phat - ME
ub <- phat + ME
CI <- c(CL,lb,phat,ub,ME)
names(CI) <- c("Confidence Level", "lower Bound","phat","Upper Bound", "Margin of Error")
CI
## Confidence Level lower Bound phat Upper Bound
## 0.95000000 0.61018317 0.70000000 0.78981683
## Margin of Error
## 0.08981683
We are 95% confident that the true proportion falls between .61 and .79. We could also use the margin of error terminology. We are 95% confident that the true proportion is within .09 of the sample proportion .7.
Suppose we had a sample proportion of .2, a sample size of 1,000 and we wanted an 80% confidence interval for the population proportion.
# Code snippet to compute a confidence interval for a proportion
# Inputs
################################################################
phat <- .2 # Estimated proportion
CL <- .8 # Required confidence level
n <- 1000 # Sample size
################################################################
zstar <- qnorm(CL+.5*(1-CL))
se.phat <-sqrt(phat*(1-phat)/n)
ME = zstar * se.phat
lb <- phat - ME
ub <- phat + ME
CI <- c(CL,lb,phat,ub,ME)
names(CI) <- c("Confidence Level", "lower Bound","phat","Upper Bound", "Margin of Error")
CI
## Confidence Level lower Bound phat Upper Bound
## 0.80000000 0.18378951 0.20000000 0.21621049
## Margin of Error
## 0.01621049
What is a proper statement of the results?
We are 80% confident that the true value of the proportion lies between .184 and .216.
Or
We are 80% confident that the true value of the proportion lies within .016 of the estimated proportion, .2.
It is important to use a proper statement. Never use “probability” in describing a confidence interval.
The two confidence interval formulas incorporate the “margin of error” as an intermediate step. The margin of error is one half of the width of the confidence interval. How large a sample size would be required to yield a given margin of error is a frequent question. This question can be answerd by solving the formula for the margin of error for the sample size.
The following code snippet accepts a confidence level, a population standard deviation and a margin of error. It produces a required sample size.
# Code snippet to compute the sample size required to for a
# given margin of error for a population mean. The other
# required inputs are the confidence level and the population
# standard deviation.
#Inputs
##############################################################
CL = .95
ME = 3
sigma = 15
##############################################################
# Computation
zstar = qnorm(CL+.5*(1-CL))
n = ((zstar*sigma)/ME)^2
ceiling(n)
## [1] 97
The following code snippet accepts an estimated proportion, a confidence level and a margin of error. It produces a sample size which would provide the given margin of error.
# Code Snippet to compute a required sample size for a proportion
#Inputs
##############################################################
phat = .66
ME = .03
CL = .95
##############################################################
# Computation
zstar = qnorm(CL+.5*(1-CL))
n = (phat*(1-phat)*zstar^2)/ME^2
ceiling(n)
## [1] 958