Introduction

We now turn to estimation of a population proportion which is based on the result that if \(np \ge 10\) and \(n(1-p) \ge 10\) then, \[\hat p \sim N(p,\sqrt{\frac{p(1-p)}{n}})\]

Confidence Interval

Suppose there is a sample of size \(n\) from a population with unknown population proportion \(p\). Then a \(100(1-\alpha)\%\) confidence interval for \(p\) is \[\hat p \pm z_{\alpha/2}\sqrt{\frac{\hat p(1-\hat p)}{n}}\] if \(n \hat p \ge 10\) and \(n(1 - \hat p) \ge 10\)

Example 1: In a presidential election year, candidates want to know how voters in various parts of the country will vote. Suppose that \(420\) registered voters are asked if they would vote for a particular candidate if the election were held today. From this sample, \(223\) indicated that they would vote for this particular candidate. Construct a 98% confidence interval for the proportion of voters who would vote for this candidate.

In this problem, \(p=\) true proportion of all registered voters who would vote for this particular candidate. We want a \(98\%\) confidence interval so \(100(1-\alpha) = 98\) which means that \(\alpha=0.02\). The easiest thing to do is use the following code in R:

x <- 223
n <- 420
phat <- x/n
se <- sqrt(phat*(1-phat)/n)
alpha <- 0.02
z <- qnorm(1-alpha/2,0,1)
phat - z*se
[1] 0.4743042
phat + z*se
[1] 0.5876006

We are \(98\%\) confident that the true proportion of registered voters who would vote for this particular candidate lies between \(0.4743042\) and \(0.5876006\). This means that \(98\%\) of the time we are using a method that will give us an interval that contains the true unknown value of \(p\).

It is required that \(n \hat p \ge 10\) and \(n(1- \hat p) \ge 10\). Using R to check these requirements, we get that both of these quantities are greater than ten.

x <- 223
n <- 420
phat <- x/n
n*phat
## [1] 223
n*(1-phat)
## [1] 197

Sample Size Calculation

We may want to know how large of a sample to take in order to estimate \(p\) to within the margin of error \(m\) with \(100(1-\alpha)\%\) confidence.

Consider the formula for the margin of error \[m = z_{\alpha/2} \sqrt{\frac{p(1-p)}{n}}\] Solving for \(n\), we get \[n = p(1-p) \Big( \frac{z_{\alpha/2}}{m} \Big)^2\]

However, we donโ€™t know \(p\) so we can either come up with an estimate or we can substitute in \(p = 0.5\). The value \(p = 0.5\) maximizes the function \(p(1-p)\) and hence gives us the largest (safest) value of \(n\) for the problem.

The sample size to estimate \(p\) within a margin of error \(m\) with \(100(1-\alpha)\%\) confidence is

\[n = p(1-p)\Big( \frac{z_{\alpha/2}}{m} \Big)^2\]

We must round our final answer UP to the next integer.

Example 2: The student government association at a university wants to estimate the proportion of students who support a change being considered in the academic calendar. How many students should be surveyed to estimate the proportion who support the change within \(2\%\) with \(90\%\) confidence

  • if \(p\) is estimated to be \(0.2\)
  • if no estimate of \(p\) is available

The margin of error is \(2\%\) so \(m=0.02\). The confidence level is \(90 = 100(1-\alpha)\) so \(\alpha = 0.1\).

  • if \(p = 0.2\) then our calculations below in R give us \(n = 1082.217\) so we round up to get our final answer that \(1083\) students should be surveyed.
m <- 0.02
p <- 0.2
alpha <- 0.10
z <- qnorm(1-alpha/2,0,1)
n = p*(1-p)*(z/m)^2
n
## [1] 1082.217
  • if we have no estimate for \(p\) then we use \(p = 0.5\) then our calculations below in R give us \(n = 1690.965\) so we round up to get our final answer that \(1691\) students should be surveyed.
m <- 0.02
p <- 0.5
alpha <- 0.10
z <- qnorm(1-alpha/2,0,1)
n = p*(1-p)*(z/m)^2
n
## [1] 1690.965