Two Proportion Z Test

What is two proportion Z test

The two-proportions z-test is used to compare two observed proportions.

For example, we have two groups of individuals:

Group A with lung cancer: n = 500
Group B, healthy individuals: n = 500
The number of smokers in each group is as follow:
Group A with lung cancer: n = 500, 490 smokers,
Group B, healthy individuals: n = 500, 400 smokers,

We want to know, whether the proportions of smokers are the same in the two groups of individuals?

pA is the proportion observed in group A with size nA
pB is the proportion observed in group B with size nB
p and q are the overall proportions

Note that, the formula of z-statistic is valid only when sample size (n) is large enough. nAp, nAq, nBp and nBq should be ≥ 5.

R Function: prop.test()

prop.test(x, n, p = NULL, alternative = “two.sided”, correct = TRUE)

x: a vector of counts of successes
n: a vector of count trials
alternative: a character string specifying the alternative hypothesis
correct: a logical indicating whether Yates’ continuity correction should be applied where possible

Note that, by default, the function prop.test() used the Yates continuity correction, which is really important if either the expected successes or failures is < 5. If you don’t want the correction, use the additional argument correct = FALSE in prop.test() function. The default value is TRUE. (This option must be set to FALSE to make the test mathematically equivalent to the uncorrected z-test of a proportion.)

Compute two Proportions Z -test

We want to know, whether the proportions of smokers are the same in the two groups of individuals?

res <- prop.test(x = c(490, 400), n = c(500, 500))
# Printing the results
res

## 
##  2-sample test for equality of proportions with continuity
##  correction
## 
## data:  c(490, 400) out of c(500, 500)
## X-squared = 80.909, df = 1, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.1408536 0.2191464
## sample estimates:
## prop 1 prop 2 
##   0.98   0.80

The function returns:

the value of Pearson’s chi-squared test statistic.
a p-value
a 95% confidence intervals
an estimated probability of success (the proportion of smokers in the two groups)

Note that:

if you want to test whether the observed proportion of smokers in group A (pA) is less than the observed proportion of smokers in group (pB), type this:
prop.test(x = c(490, 400), n = c(500, 500), alternative = “less”)
Or, if you want to test whether the observed proportion of smokers in group A (pA) is greater than the observed proportion of smokers in group (pB), type this:
prop.test(x = c(490, 400), n = c(500, 500), alternative = “greater”)

Interpretation of the Result

The p-value of the test is 2.36310^{-19}, which is less than the significance level alpha = 0.05. We can conclude that the proportion of smokers is significantly different in the two groups with a p-value = 2.36310^{-19}.

Access to the values returned by prop.test() function

The result of prop.test() function is a list containing the following components:

statistic: the number of successes
parameter: the number of trials
p.value: the p-value of the test
conf.int: a confidence interval for the probability of success.
estimate: the estimated probability of success.

# printing the p-value
res$p.value

## [1] 2.363439e-19

# printing the mean
res$estimate

## prop 1 prop 2 
##   0.98   0.80

# printing the confidence interval
res$conf.int

## [1] 0.1408536 0.2191464
## attr(,"conf.level")
## [1] 0.95