Start with the fact that 95% of the probability of any normal distribution falls between \(\mu-1.96*\sigma\) and \(\mu+1.96*\sigma\). We can apply this fact to the sampling distributions from the previous module and obtain confidence intervals for the true mean and true proportion based on sample estimates and sample sizes. If \(\bar{X}\) is an estimate of the mean based on a sample of size \(n\) drawn from a population with known standard deviation \(\sigma\), a 95% confidence is given by
\[\bar{X}\pm 1.96*\frac{\sigma}{\sqrt{n}}\] The following code snippet allows you to substitute values and obtain a confidence interval based on the central limit theorem. This code depends on two assumptions:
It assumes a default confidence level of 95%, but this can be changed.
# Code snippet to construct a confidence interval for a
# population mean given a known population standard deviation
# and a sample mean from a sample of size n
xbar = 100 # Sample Mean
sd = 10 # Population Standard Deviation
n = 250 # Sample size
CL = .95 # Required Confidence Level
zstar <- qnorm(CL+.5*(1-CL)) # Obtain Z-score for this confidence level
sd.xbar <- sd/sqrt(n) # Compute standard error of sample mean
ME <- zstar * sd.xbar # Compute margin of error
lb <- xbar - ME # Compute lower bound of CI
ub <- xbar + ME # Compute upper bound of CI
CI <- c(CL,lb,xbar,ub,ME) # Put our results in a vector
names(CI) <- c("Confidence Level","Lower Bound","Xbar","Upper Bound",
"Margin of Error") # Name the vector elements
CI # Display the vector
## Confidence Level Lower Bound Xbar Upper Bound
## 0.95000 98.76041 100.00000 101.23959
## Margin of Error
## 1.23959
It is important to describe the results of this process properly. Here is the usual correct statement using th econcept of confidence interval. “We are 95% confident that the true population mean is between 98.76 and 101.24.”
Here is an alternative statement using the concept of margin of error. “We are 95% confident that the true population mean is within 1.24 of the estimated mean 100.00.”
For example, suppose we had a sample mean of 123 based on a sample of size 50 and we knew that the sample was drawn from a population with a standard deviation of 15. What is a 90% confidence interval for the population mean.
# Code snippet to construct a confidence interval for a
# population mean given a known population standard deviation
# and a sample mean from a sample of size n.
xbar = 123 # Sample Mean
sd = 15 # Population Standard Deviation
n = 50 # Sample size
CL = .9 # Required Confidence Level
zstar <- qnorm(CL+.5*(1-CL)) # Obtain Z-score for this confidence level
sd.xbar <- sd/sqrt(n) # Compute standard of sample mean
ME <- zstar * sd.xbar # Compute margin of error
lb <- xbar - ME # Compute lower bound of CI
ub <- xbar + ME # Compute upper bound of CI
CI <- c(CL,lb,xbar,ub,ME) # Put our results in a vector
names(CI) <- c("Confidence Level","Lower Bound","Xbar","Upper Bound",
"Margin of Error") # Name the vector elements
CI # Display the vector
## Confidence Level Lower Bound Xbar Upper Bound
## 0.900000 119.510739 123.000000 126.489261
## Margin of Error
## 3.489261
We are 95% confident that the population mean lies between 119.5 and 126.5.
We can do the same kind of thing for confidence intervals for proportions.
The Sampling Distribution of the Proportion
We have the basic theoretical results.
The estimates of a proportion \(\hat{p}\) based on a sample of size n is approsimately normal and has the following mean and standard deviation provided that \(n\hat{p} > 10\) and \(n(1-\hat{p}) > 10\).
\[\mu_{\hat{p}} = p\] and \[\sigma_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}}\]
Here is a code snippet. You can replace the values in the first few lines and run the entire snippet.
# Code snippet to compute a confidence interval for a proportion
phat <- .7 # Estimated proportion
CL <- .95 # Required confidence level
n <- 100 # Sample size
zstar <- qnorm(CL+.5*(1-CL))
se.phat <-sqrt(phat*(1-phat)/n)
ME = zstar * se.phat
lb <- phat - ME
ub <- phat + ME
CI <- c(CL,ME,lb,phat,ub)
names(CI) <- c("Confidence Level", "Margin of Error", "lower Bound","phat","Upper Bound")
CI
## Confidence Level Margin of Error lower Bound phat
## 0.95000000 0.08981683 0.61018317 0.70000000
## Upper Bound
## 0.78981683
We are 95% confident that the true proportion falls between .61 and .79. We could also use the margin of error terminology. We are 95% confident that the true proportion is within .09 of the sample proportion .7.
For example, suppose we had a sample proportion of .2, a sample size of 1,000 and we wanted an 80% confidence interval for the population proportion.
# Code snippet to compute a confidence interval for a proportion
phat <- .2 # Estimated proportion
CL <- .8 # Required confidence level
n <- 1000 # Sample size
zstar <- qnorm(CL+.5*(1-CL))
se.phat <-sqrt(phat*(1-phat)/n)
ME = zstar * se.phat
lb <- phat - ME
ub <- phat + ME
CI <- c(CL,ME,lb,phat,ub)
names(CI) <- c("Confidence Level", "Margin of Error", "lower Bound","phat","Upper Bound")
CI
## Confidence Level Margin of Error lower Bound phat
## 0.80000000 0.01621049 0.18378951 0.20000000
## Upper Bound
## 0.21621049
What is a proper statement of the results?
The two confidence interval formulas incorporate the “margin of error” as an intermediate step. The margin of error is one half of the width of the confidence interval. How large a sample size would be required to yield a given margin of error is a frequent question. This question can be answerd by solving the formula for the margin of error for the sample size.
The following code snippet accepts a confidence level, a population standard deviation and a margin of error. It produces a required sample size.
# Code snippet to compute the sample size required to for a
# given margin of error for a population mean. The other
# required inputs are the confidence level and the population
# standard deviation.
#Inputs
CL = .95
ME = 3
sigma = 15
# Computation
zstar = qnorm(CL+.5*(1-CL))
n = ((zstar*sigma)/ME)^2
ceiling(n)
## [1] 97
The following code snippet accepts a proportion, a confidence level and a margin of error. It produces a sample size which would provide the given margin of error.
# Code Snippet to compute a required sample size for a proportion
#Inputs
phat = .66
ME = .03
CL = .95
# Computation
zstar = qnorm(CL+.5*(1-CL))
n = (phat*(1-phat)*zstar^2)/ME^2
ceiling(n)
## [1] 958