Setup

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Confidence Intervals

Start with the fact that 95% of the probability of any normal distribution falls between \(\mu-1.96*\sigma\) and \(\mu+1.96*\sigma\). We can apply this fact to the sampling distributions from the previous module and obtain confidence intervals for the true mean and true proportion based on sample estimates and sample sizes. If \(\bar{X}\) is an estimate of the mean based on a sample of size \(n\) drawn from a population with known standard deviation \(\sigma\), a 95% confidence is given by

\[\bar{X}\pm 1.96*\frac{\sigma}{\sqrt{n}}\]

A Useful Snippet

The following code snippet allows you to substitute values and obtain a confidence interval based on the central limit theorem. This code depends on two assumptions:

The value of the populaion standard deviation is known.
The population has a normal distribution or the sample size is greater than 30.

It assumes a default confidence level of 95%, but this can be changed.

Snippet for CI of a Mean

# Code snippet to construct a confidence interval for a 
# population mean given a known population standard deviation 
# and a sample mean from a sample of size n

# Provide inout here.
##################################################################
  xbar = 100  # Sample Mean
  sd   = 10   # Population Standard Deviation
  n    = 250   # Sample size
  CL   = .95  # Required Confidence Level
##################################################################  
  
  
  zstar <- qnorm(CL+.5*(1-CL)) # Obtain Z-score for this confidence level
  sd.xbar <- sd/sqrt(n)        # Compute standard error of sample mean 
  ME <- zstar * sd.xbar        # Compute margin of error
  
  lb <- xbar - ME              # Compute lower bound of CI
  ub <- xbar + ME              # Compute upper bound of CI
  
  CI <- c(CL,lb,xbar,ub,ME)    # Put our results in a vector
# Name the vector elements
  names(CI) <- c("Confidence Level","Lower Bound","Xbar","Upper Bound","Margin of Error") # Name the vector elements
  CI                           # Display the vector

## Confidence Level      Lower Bound             Xbar      Upper Bound 
##          0.95000         98.76041        100.00000        101.23959 
##  Margin of Error 
##          1.23959

It is important to describe the results of this process properly. Here is the usual correct statement using the concept of confidence interval. “We are 95% confident that the true population mean is between 98.76 and 101.24.”

Here is an alternative statement using the concept of margin of error. “We are 95% confident that the true population mean is within 1.24 of the estimated mean 100.00.”

An Example

Suppose we want to get a 90% confidence interval for the the mean of the variable mpg in the mtcars dataframe. The following code will work. Note that we can use R functions to obtain the inputs for the process directly in the code snippet.

# Provide input here.
##################################################################
  xbar = mean(mtcars$mpg)      # Sample Mean
  sd   = sd(mtcars$mpg)        # Population Standard Deviation
  n    = length(mtcars$mpg)    # Sample size
  CL   = .9                    # Required Confidence Level
#################################################################
  
  zstar <- qnorm(CL+.5*(1-CL)) # Obtain Z-score for this confidence level
  sd.xbar <- sd/sqrt(n)        # Compute standard of sample mean 
  ME <- zstar * sd.xbar        # Compute margin of error
  
  lb <- xbar - ME              # Compute lower bound of CI
  ub <- xbar + ME              # Compute upper bound of CI
  
  CI <- c(CL,lb,xbar,ub,ME)    # Put our results in a vector

  names(CI) <- c("Confidence Level","Lower Bound","Xbar","Upper Bound",
                 "Margin of Error") # Name the vector elements
  CI                           # Display the vector

## Confidence Level      Lower Bound             Xbar      Upper Bound 
##         0.900000        18.338159        20.090625        21.843091 
##  Margin of Error 
##         1.752466

We are 90% confident that the population mean lies between 18.34 and 21.84.

Proportions

We can do the same kind of thing for confidence intervals for proportions.

The Sampling Distribution of the Proportion

We have the basic theoretical results.

The estimates of a proportion \(\hat{p}\) based on a sample of size n is approsimately normal and has the following mean and standard deviation provided that \(n\hat{p} > 10\) and \(n(1-\hat{p}) > 10\).

\[\mu_{\hat{p}} = p\] and \[\sigma_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}}\]

A Code Snippet

Here is a code snippet. You can replace the values in the first few lines and run the entire snippet.

# Code snippet to compute a confidence interval for a proportion

# Inputs
################################################################
phat <- .7   # Estimated proportion
CL <- .95    # Required confidence level
n <- 100     # Sample size
################################################################

zstar <- qnorm(CL+.5*(1-CL))
se.phat <-sqrt(phat*(1-phat)/n)
ME = zstar * se.phat

lb <- phat - ME
ub <- phat + ME

CI <- c(CL,lb,phat,ub,ME)
names(CI) <- c("Confidence Level",  "lower Bound","phat","Upper Bound", "Margin of Error")

CI

## Confidence Level      lower Bound             phat      Upper Bound 
##       0.95000000       0.61018317       0.70000000       0.78981683 
##  Margin of Error 
##       0.08981683

We are 95% confident that the true proportion falls between .61 and .79. We could also use the margin of error terminology. We are 95% confident that the true proportion is within .09 of the sample proportion .7.

Another Example

Suppose we had a sample proportion of .2, a sample size of 1,000 and we wanted an 80% confidence interval for the population proportion.

# Code snippet to compute a confidence interval for a proportion

# Inputs
################################################################
phat <- .2    # Estimated proportion
CL <- .8      # Required confidence level
n <- 1000     # Sample size
################################################################

zstar <- qnorm(CL+.5*(1-CL))
se.phat <-sqrt(phat*(1-phat)/n)

ME = zstar * se.phat

lb <- phat - ME
ub <- phat + ME

CI <- c(CL,lb,phat,ub,ME)
names(CI) <- c("Confidence Level",  "lower Bound","phat","Upper Bound", "Margin of Error")

CI

## Confidence Level      lower Bound             phat      Upper Bound 
##       0.80000000       0.18378951       0.20000000       0.21621049 
##  Margin of Error 
##       0.01621049

What is a proper statement of the results?

A Good Sentence

We are 80% confident that the true value of the proportion lies between .184 and .216.

We are 80% confident that the true value of the proportion lies within .016 of the estimated proportion, .2.

It is important to use a proper statement. Never use “probability” in describing a confidence interval.

Required Sample Sizes

The two confidence interval formulas incorporate the “margin of error” as an intermediate step. The margin of error is one half of the width of the confidence interval. How large a sample size would be required to yield a given margin of error is a frequent question. This question can be answerd by solving the formula for the margin of error for the sample size.

The following code snippet accepts a confidence level, a population standard deviation and a margin of error. It produces a required sample size.

# Code snippet to compute the sample size required to for a
# given margin of error for a population mean.  The other
# required inputs are the confidence level and the population
# standard deviation.


#Inputs 
##############################################################
CL = .95
ME = 3
sigma = 15
##############################################################

# Computation
zstar = qnorm(CL+.5*(1-CL))
n = ((zstar*sigma)/ME)^2
ceiling(n)

## [1] 97

Example

The following code snippet accepts an estimated proportion, a confidence level and a margin of error. It produces a sample size which would provide the given margin of error.

# Code Snippet to compute a required sample size for a proportion

#Inputs 
##############################################################
phat = .66
ME = .03
CL = .95
##############################################################

# Computation
zstar = qnorm(CL+.5*(1-CL))
n = (phat*(1-phat)*zstar^2)/ME^2
ceiling(n)

## [1] 958