The Difference between Two Proportions

Two National Household Surveys

In 1985, The National Household Survey found that 36.9% of 18-25 years olds smoked cigarettes. In 1992, the same same survey found that 31.9% of 18-35 year olds smoked cigarettes. Both surveys were based on samples of 700 18-25 year olds.

Contruct and 95% confidence interval for the decrease in smoking rate amound 18-25 year olds between the two surveys.

Formulas

Standard error (standard deviation) in a Proportion

\[ \begin{equation} \label{E:SE in a proportion} \sigma_\bar{x} = \sqrt{ \frac{\hat{p}(1-\hat{p})}{n} } \end{equation} \]

Standard Deviation in a Difference

\[ \begin{equation} \label{E:Standard Deviation in a Difference} \sigma_{\bar{x_1}-\bar{x_2}} = \sqrt{ \sigma_\bar{x_1}^2+\sigma_\bar{x_2}^2 } \end{equation} \]

Now with R...the variables

p0 is the null hypothesis

p1 <- .369; n1 <- 700
p2 <- .319; n2 <- 700

(p0 <- (p1*n1+p2*n2)/(n1+n2))

[1] 0.344

Now with R... A Confidence Interval

sigma1 <- sqrt(p1*(1-p1)/n1)
sigma2 <- sqrt(p2*(1-p2)/n2)
(sigma_diff <- sqrt(sigma1^2+sigma2^2))

[1] 0.02536

p1-p2 + qnorm(c(.025,.975))*sigma_diff

[1] 0.0003015 0.0996985

Now with R... A p-value

This is a test of the null hypothesis, so use p0

sigma1 <- sqrt(p0*(1-p0)/n1)
sigma2 <- sqrt(p0*(1-p0)/n2)
(sigma_diff <- sqrt(sigma1^2+sigma2^2))

[1] 0.02539

z <- (p1-p2)/sigma_diff
2*(1-pnorm(z))

[1] 0.04894

The Easy Way

prop.test(x=c(n1*p1, n2*p2), n=c(n1,n2), correct=FALSE)


    2-sample test for equality of proportions without continuity
    correction

data:  c(n1 * p1, n2 * p2) out of c(n1, n2)
X-squared = 3.877, df = 1, p-value = 0.04894
alternative hypothesis: two.sided
95 percent confidence interval:
 0.0003015 0.0996985
sample estimates:
prop 1 prop 2 
 0.369  0.319

With Yates' Continuity Correction

because these follow binomial distributions that we're merely approximating as normal

prop.test(x=c(n1*p1, n2*p2), n=c(n1,n2), correct=TRUE)


    2-sample test for equality of proportions with continuity
    correction

data:  c(n1 * p1, n2 * p2) out of c(n1, n2)
X-squared = 3.659, df = 1, p-value = 0.05577
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.001127  0.101127
sample estimates:
prop 1 prop 2 
 0.369  0.319