We now turn to estimation of the difference between two population proportions which is based on the result that if \(n_1 p_1 \ge 10\), \(n_1(1-p_1) \ge 10\), \(n_2 p_2 \ge 10\) and \(n_2(1-p_2) \ge 10\) then, \[\hat p_1 - \hat p_2 \sim N(p_1-p_2,\sqrt{\frac{p_1(1-p_1)}{n_1}+\frac{p_2(1-p_2)}{n_2}})\]
Suppose there are two independent samples of size \(n_1\) and \(n_2\) from populations with unknown population proportions \(p_1\) and \(p_2\). Then a \(100(1-\alpha)\%\) confidence interval for \(p_1 - p_2\) is
\[\hat p_1 - \hat p_2 \pm z_{\alpha/2} \sqrt{\frac{\hat p_1(1-\hat p_1)}{n_1} + \frac{\hat p_2(1-\hat p_2)}{n_2}}\] if \(n_1 \hat p_1 \ge 10\), \(n_1(1 - \hat p_1) \ge 10\), \(n_2 \hat p_2 \ge 10\) and \(n_2 (1 - \hat p_2) \ge 10\)
Example 1:
Voters in a particular city who identify themselves with one or the other of two political parties were randomly selected and asked if they favor a proposal to allow citizens with proper license to carry a concealed handgun in city parks. Of the 150 voters from Party A, 90 voters favored the proposal. Of the 200 voters from Party B, 140 voters favored the proposal.
Construct a 95% confidence interval for the difference in the proportion of all members of Party A and all members of Party B who favor the proposal.
Click For AnswerIn this problem, \(p_1=\) true proportion of Party A voters who favor the proposal and \(p_2=\) true proportion of Party B voters who favor the proposal. We want a \(95\%\) confidence interval so \(100(1-\alpha) = 95\) which means that \(\alpha=0.05\). The easiest thing to do is use the following code in R:
x1 <- 90
n1 <- 150
phat1 <- x1/n1
x2 <- 140
n2 <- 200
phat2 <- x2/n2
se <- sqrt(phat1*(1-phat1)/n1 + phat2*(1-phat2)/n2)
alpha <- 0.05
z <- qnorm(1-alpha/2,0,1)
(phat1 - phat2) - z*se
[1] -0.2008953
(phat1 - phat2) + z*se
[1] 0.0008953214
We are \(95\%\) confident that the true proportion of registered voters who would vote for this particular candidate lies between \(-0.2008953\) and \(0.0008953214\). This means that \(95\%\) of the time we are using a method that will give us an interval that contains the true unknown value of \(p_1 - p_2\).
It is required that \(n_1 \hat p_1 \ge 10\), \(n_1 (1 - \hat p_1) \ge 10\), \(n_2 \hat p_2 \ge 10\) and \(n_2 (1 - \hat p_2) \ge 10\). Using R to check these requirements, we get that both of these quantities are greater than ten.
x1 <- 223
n1 <- 420
phat1 <- x1/n1
x2 <- 140
n2 <- 200
phat2 <- x2/n2
n1*phat1
## [1] 223
n1*(1-phat1)
## [1] 197
n2*phat2
## [1] 140
n2*(1-phat2)
## [1] 60