Confidence Intervals

Start with the fact that 95% of the probability of any normal distribution falls between \(\mu-1.96*\sigma\) and \(\mu+1.96*\sigma\). We can apply this fact to the sampling distributions from the previous module and obtain confidence intervals for the true mean and true proportion based on sample estimates and sample sizes. If \(\bar{X}\) is an estimate of the mean based on a sample of size \(n\) drawn from a population with known standard deviation \(\sigma\), a 95% confidence is given by

\[\bar{X}\pm 1.96*\frac{\sigma}{\sqrt{n}}\] The following code snippet allows you to substitute values and obtain a confidence interval based on the central limit theorem. This code depends on two assumptions:

It assumes a default confidence level of 95%, but this can be changed.

# Code snippet to construct a confidence interval for a 
# population mean given a known population standard deviation 
# and a sample mean from a sample of size n

  xbar = 100  # Sample Mean
  sd   = 10   # Population Standard Deviation
  n    = 250   # Sample size
  CL   = .95  # Required Confidence Level
  
  zstar <- qnorm(CL+.5*(1-CL)) # Obtain Z-score for this confidence level
  sd.xbar <- sd/sqrt(n)        # Compute standard error of sample mean 
  ME <- zstar * sd.xbar        # Compute margin of error
  
  lb <- xbar - ME              # Compute lower bound of CI
  ub <- xbar + ME              # Compute upper bound of CI
  
  CI <- c(CL,lb,xbar,ub,ME)    # Put our results in a vector

  names(CI) <- c("Confidence Level","Lower Bound","Xbar","Upper Bound",
                 "Margin of Error") # Name the vector elements
  CI                           # Display the vector
## Confidence Level      Lower Bound             Xbar      Upper Bound 
##          0.95000         98.76041        100.00000        101.23959 
##  Margin of Error 
##          1.23959

It is important to describe the results of this process properly. Here is the usual correct statement using th econcept of confidence interval. “We are 95% confident that the true population mean is between 98.76 and 101.24.”

Here is an alternative statement using the concept of margin of error. “We are 95% confident that the true population mean is within 1.24 of the estimated mean 100.00.”

For example, suppose we had a sample mean of 123 based on a sample of size 50 and we knew that the sample was drawn from a population with a standard deviation of 15. What is a 90% confidence interval for the population mean.

# Code snippet to construct a confidence interval for a 
# population mean given a known population standard deviation 
# and a sample mean from a sample of size n.
  
  xbar = 123  # Sample Mean
  sd   = 15   # Population Standard Deviation
  n    = 50   # Sample size
  CL   = .9  # Required Confidence Level
  
  zstar <- qnorm(CL+.5*(1-CL)) # Obtain Z-score for this confidence level
  sd.xbar <- sd/sqrt(n)        # Compute standard of sample mean 
  ME <- zstar * sd.xbar        # Compute margin of error
  
  lb <- xbar - ME              # Compute lower bound of CI
  ub <- xbar + ME              # Compute upper bound of CI
  
  CI <- c(CL,lb,xbar,ub,ME)    # Put our results in a vector

  names(CI) <- c("Confidence Level","Lower Bound","Xbar","Upper Bound",
                 "Margin of Error") # Name the vector elements
  CI                           # Display the vector
## Confidence Level      Lower Bound             Xbar      Upper Bound 
##         0.900000       119.510739       123.000000       126.489261 
##  Margin of Error 
##         3.489261

We are 95% confident that the population mean lies between 119.5 and 126.5.

We can do the same kind of thing for confidence intervals for proportions.

The Sampling Distribution of the Proportion

We have the basic theoretical results.

The estimates of a proportion \(\hat{p}\) based on a sample of size n is approsimately normal and has the following mean and standard deviation provided that \(n\hat{p} > 10\) and \(n(1-\hat{p}) > 10\).

\[\mu_{\hat{p}} = p\] and \[\sigma_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}}\]

Here is a code snippet. You can replace the values in the first few lines and run the entire snippet.

# Code snippet to compute a confidence interval for a proportion

phat <- .7   # Estimated proportion
CL <- .95    # Required confidence level
n <- 100     # Sample size

zstar <- qnorm(CL+.5*(1-CL))
se.phat <-sqrt(phat*(1-phat)/n)
ME = zstar * se.phat

lb <- phat - ME
ub <- phat + ME

CI <- c(CL,ME,lb,phat,ub)
names(CI) <- c("Confidence Level", "Margin of Error",  "lower Bound","phat","Upper Bound")

CI
## Confidence Level  Margin of Error      lower Bound             phat 
##       0.95000000       0.08981683       0.61018317       0.70000000 
##      Upper Bound 
##       0.78981683

We are 95% confident that the true proportion falls between .61 and .79. We could also use the margin of error terminology. We are 95% confident that the true proportion is within .09 of the sample proportion .7.

For example, suppose we had a sample proportion of .2, a sample size of 1,000 and we wanted an 80% confidence interval for the population proportion.

# Code snippet to compute a confidence interval for a proportion

phat <- .2    # Estimated proportion
CL <- .8      # Required confidence level
n <- 1000     # Sample size

zstar <- qnorm(CL+.5*(1-CL))
se.phat <-sqrt(phat*(1-phat)/n)

ME = zstar * se.phat

lb <- phat - ME
ub <- phat + ME

CI <- c(CL,ME,lb,phat,ub)
names(CI) <- c("Confidence Level", "Margin of Error",  "lower Bound","phat","Upper Bound")

CI
## Confidence Level  Margin of Error      lower Bound             phat 
##       0.80000000       0.01621049       0.18378951       0.20000000 
##      Upper Bound 
##       0.21621049

What is a proper statement of the results?

Required Sample Sizes

The two confidence interval formulas incorporate the “margin of error” as an intermediate step. The margin of error is one half of the width of the confidence interval. How large a sample size would be required to yield a given margin of error is a frequent question. This question can be answerd by solving the formula for the margin of error for the sample size.

The following code snippet accepts a confidence level, a population standard deviation and a margin of error. It produces a required sample size.

# Code snippet to compute the sample size required to for a
# given margin of error for a population mean.  The other
# required inputs are the confidence level and the population
# standard deviation.


#Inputs 
CL = .95
ME = 3
sigma = 15

# Computation
zstar = qnorm(CL+.5*(1-CL))
n = ((zstar*sigma)/ME)^2
ceiling(n)
## [1] 97

The following code snippet accepts a proportion, a confidence level and a margin of error. It produces a sample size which would provide the given margin of error.

# Code Snippet to compute a required sample size for a proportion

#Inputs
phat = .66
ME = .03
CL = .95

# Computation
zstar = qnorm(CL+.5*(1-CL))
n = (phat*(1-phat)*zstar^2)/ME^2
ceiling(n)
## [1] 958