We now turn to estimation of a population proportion which is based on the result that if \(np \ge 10\) and \(n(1-p) \ge 10\) then, \[\hat p \sim N(p,\sqrt{\frac{p(1-p)}{n}})\]
Suppose there is a sample of size \(n\) from a population with unknown population proportion \(p\). Then a \(100(1-\alpha)\%\) confidence interval for \(p\) is \[\hat p \pm z_{\alpha/2}\sqrt{\frac{\hat p(1-\hat p)}{n}}\] if \(n \hat p \ge 10\) and \(n(1 - \hat p) \ge 10\)
Example 1: In a presidential election year, candidates want to know how voters in various parts of the country will vote. Suppose that \(420\) registered voters are asked if they would vote for a particular candidate if the election were held today. From this sample, \(223\) indicated that they would vote for this particular candidate. Construct a 98% confidence interval for the proportion of voters who would vote for this candidate.
Click For AnswerIn this problem, \(p=\) true proportion of all registered voters who would vote for this particular candidate. We want a \(98\%\) confidence interval so \(100(1-\alpha) = 98\) which means that \(\alpha=0.02\). The easiest thing to do is use the following code in R:
x <- 223
n <- 420
phat <- x/n
se <- sqrt(phat*(1-phat)/n)
alpha <- 0.02
z <- qnorm(1-alpha/2,0,1)
phat - z*se
[1] 0.4743042
phat + z*se
[1] 0.5876006
We are \(98\%\) confident that the true proportion of registered voters who would vote for this particular candidate lies between \(0.4743042\) and \(0.5876006\). This means that \(98\%\) of the time we are using a method that will give us an interval that contains the true unknown value of \(p\).
It is required that \(n \hat p \ge 10\) and \(n(1- \hat p) \ge 10\). Using R to check these requirements, we get that both of these quantities are greater than ten.
x <- 223
n <- 420
phat <- x/n
n*phat
## [1] 223
n*(1-phat)
## [1] 197
We may want to know how large of a sample to take in order to estimate \(p\) to within the margin of error \(m\) with \(100(1-\alpha)\%\) confidence.
Consider the formula for the margin of error \[m = z_{\alpha/2} \sqrt{\frac{p(1-p)}{n}}\] Solving for \(n\), we get \[n = p(1-p) \Big( \frac{z_{\alpha/2}}{m} \Big)^2\]
However, we donโt know \(p\) so we can either come up with an estimate or we can substitute in \(p = 0.5\). The value \(p = 0.5\) maximizes the function \(p(1-p)\) and hence gives us the largest (safest) value of \(n\) for the problem.
The sample size to estimate \(p\) within a margin of error \(m\) with \(100(1-\alpha)\%\) confidence is
\[n = p(1-p)\Big( \frac{z_{\alpha/2}}{m} \Big)^2\]
We must round our final answer UP to the next integer.
Example 2: The student government association at a university wants to estimate the proportion of students who support a change being considered in the academic calendar. How many students should be surveyed to estimate the proportion who support the change within \(2\%\) with \(90\%\) confidence
The margin of error is \(2\%\) so \(m=0.02\). The confidence level is \(90 = 100(1-\alpha)\) so \(\alpha = 0.1\).
m <- 0.02
p <- 0.2
alpha <- 0.10
z <- qnorm(1-alpha/2,0,1)
n = p*(1-p)*(z/m)^2
n
## [1] 1082.217
m <- 0.02
p <- 0.5
alpha <- 0.10
z <- qnorm(1-alpha/2,0,1)
n = p*(1-p)*(z/m)^2
n
## [1] 1690.965