Sampling Distribution (Practical): List 1

one-proportion Z-test

The One proportion Z-test is used to compare an observed proportion to a theoretical one, when there are only two categories. This article describes the basics of one-proportion z-test and provides practical examples using R software.

Data: For example, we have a population of mice containing half male and have female (p = 0.5 = 50%). Some of these mice (n = 160) have developed a spontaneous cancer, including 95 male and 65 female.

In this setting:

the number of successes (male with cancer) is 95

The observed proportion (po) of male is 95/160

The observed proportion (q) of female is 1−po

The expected proportion (pe) of male is 0.5 (50%)

The number of observations (n) is 160

statistical hypotheses ———————–

whether the observed proportion of male (po) is equal to the expected proportion (pe)?

whether the observed proportion of male (po) is less than the expected proportion (pe)?

whether the observed proportion of male (p) is greater than the expected proportion (pe)?

Note that:

Hypotheses 1) are called two-tailed tests

Hypotheses 2) and 3) are called one-tailed tests

The test statistic (also known as z-test) can be calculated as follow:

$z = \frac{p_o-p_e}{\sqrt{p_oq/n}}$

where,

po is the observed proportion

pe is the expected proportion

n is the sample size

if |z|<1.96, then the difference is not significant at 5%

if |z|≥1.96, then the difference is significant at 5%

The significance level (p-value) corresponding to the z-statistic can be read in the z-table. We’ll see how to compute it in R.

The confidence interval of po at 95% is defined as follow:

$p_o \pm 1.96\sqrt{\frac{p_oq}{n}}$

one proportion z-test in R
R functions: binom.test() & prop.test()
e R functions binom.test() and prop.test() can be used to perform one-proportion test:
inom.test(): compute exact binomial test. Recommended when sample size is small
rop.test(): can be used when sample size is large ( N > 30). It uses a normal approximation to binomial one-proportion z-test
inom.test(x, n, p = 0.5, alternative = “two.sided”)
op.test(x, n, p = NULL, alternative = “two.sided”, correct = TRUE)
x: the number of of successes
the total number of trials
the probability to test against.
rrect: a logical indicating whether Yates’ continuity correction should be applied where possible.
`r s <- prop.test(x = 95, n = 160, p = 0.5, correct = FALSE) Printing the results res`
`
if you want to test whether the proportion of male with cancer is less than 0.5 (one-tailed test),
`r prop.test(x = 95, n = 160, p = 0.5, correct = FALSE, alternative = "less")`
`
Or, if you want to test whether the proportion of male with cancer is greater than 0.5 (one-tailed test),
`r prop.test(x = 95, n = 160, p = 0.5, correct = FALSE, alternative = "greater")`
`

 two-proportions z-test

The two-proportions z-test is used to compare two observed proportions. This article describes the basics of two-proportions *z-test and provides pratical examples using R

Case of small sample sizes
Fisher Exact probability test is an excellent non-parametric technique for comparing proportions, when the two independent samples are small in size.

For example, we have two groups of individuals:

Group A with lung cancer: n = 500

Group B, healthy individuals: n = 500

The number of smokers in each group is as follow:

Group A with lung cancer: n = 500, 490 smokers, pA=490/500=98

Group B, healthy individuals: n = 500, 400 smokers, pB=400/500=80

In this setting:

The overall proportion of smokers is $p=\frac{490+400}{500+500}=89$

The overall proportion of non-smokers is q=1−p=11

Case of large sample sizes: ————————–

The test statistic (also known as z-test) can be calculated as follow:

$z = \frac{p_{A} - p_{B}}{\sqrt{p q / n_{A} + p q / n_{B}}}$

pA is the proportion observed in group A with size nA

pB is the proportion observed in group B with size nB

p and q are the overall proportions

res <- prop.test(x = c(490, 400), n = c(500, 500))
# Printing the results
res

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(490, 400) out of c(500, 500)
## X-squared = 80.909, df = 1, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.1408536 0.2191464
## sample estimates:
## prop 1 prop 2 
##   0.98   0.80

if you want to test whether the observed proportion of smokers in group A (pA) is less than the observed proportion of smokers in group (pB), type this:

prop.test(x = c(490, 400), n = c(500, 500),
           alternative = "less")

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(490, 400) out of c(500, 500)
## X-squared = 80.909, df = 1, p-value = 1
## alternative hypothesis: less
## 95 percent confidence interval:
##  -1.0000000  0.2131742
## sample estimates:
## prop 1 prop 2 
##   0.98   0.80

if you want to test whether the observed proportion of smokers in group A (pA) is greater than the observed proportion of smokers in group (pB), type this:

prop.test(x = c(490, 400), n = c(500, 500),
              alternative = "greater")

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(490, 400) out of c(500, 500)
## X-squared = 80.909, df = 1, p-value < 2.2e-16
## alternative hypothesis: greater
## 95 percent confidence interval:
##  0.1468258 1.0000000
## sample estimates:
## prop 1 prop 2 
##   0.98   0.80

heads <- rbinom(1, size = 100, prob = .5)
prop.test(heads, 100)          # continuity correction TRUE by default

## 
##  1-sample proportions test with continuity correction
## 
## data:  heads out of 100, null probability 0.5
## X-squared = 8.41, df = 1, p-value = 0.003732
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.2591235 0.4525560
## sample estimates:
##    p 
## 0.35

prop.test(heads, 100, correct = FALSE)

## 
##  1-sample proportions test without continuity correction
## 
## data:  heads out of 100, null probability 0.5
## X-squared = 9, df = 1, p-value = 0.0027
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.2636425 0.4474556
## sample estimates:
##    p 
## 0.35

Real Data Example
------------------

## Data from Fleiss (1981), p. 139.
## H0: The null hypothesis is that the four populations from which
##     the patients were drawn have the same true proportion of smokers.
## A:  The alternative is that this proportion is different in at
##     least one of the populations.

smokers  <- c( 83, 90, 129, 70 )
patients <- c( 86, 93, 136, 82 )
prop.test(smokers, patients)

## 
##  4-sample test for equality of proportions without continuity
##  correction
## 
## data:  smokers out of patients
## X-squared = 12.6, df = 3, p-value = 0.005585
## alternative hypothesis: two.sided
## sample estimates:
##    prop 1    prop 2    prop 3    prop 4 
## 0.9651163 0.9677419 0.9485294 0.8536585

References

http://www.sthda.com/english/wiki/one-proportion-z-test-in-r

http://www.sthda.com/english/wiki/chi-square-goodness-of-fit-test-in-r

http://www.sthda.com/english/wiki/two-proportions-z-test-in-r

http://www.sthda.com/english/wiki/chi-square-test-of-independence-in-r.Rmd

http://www.sthda.com/english/home/error.php

http://www.sthda.com/english/wiki/two-proportions-z-test-in-r

Sampling Distribution (Practical): List 1

Real Datasets in R & Proportion-Test (Hypothesis Test of significnce and confidence Intervals for single Proportion and difference of Two Proportions in R)

BSc 3rd Sem

8 dec 2022

one-proportion Z-test