The One proportionZ-test is used to compare an observed proportion to a theoretical one, when there are only two categories. This article describes the basics of one-proportion z-test and provides practical examples using R software.
For example, we have a population of mice containing half male and have female (p = 0.5 = 50%). Some of these mice (n = 160) have developed a spontaneous cancer, including 95 male and 65 female.
In this setting:
the number of successes (male with cancer) is 95
The observed proportion (po) of male is 95/160
The observed proportion (q) of female is 1-po
The expected proportion (pe) of male is 0.5 (50%)
The number of observations (nn) is 160
R functions: binom.test() & prop.test()
The R functions binom.test() and prop.test() can be used to perform one-proportion test:
binom.test(): compute exact binomial test. Recommended when sample size is small
prop.test(): can be used when sample size is large ( N > 30). It uses a normal approximation to binomial
prop.test(x=95,n=160,p=0.5, correct=FALSE)
1-sample proportions test without continuity correction
data: 95 out of 160, null probability 0.5
X-squared = 5.625, df = 1, p-value = 0.01771
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.5163169 0.6667870
sample estimates:
p
0.59375
prop.test(x =95, n =160, p =0.5, correct =FALSE,alternative ="less")
1-sample proportions test without continuity correction
data: 95 out of 160, null probability 0.5
X-squared = 5.625, df = 1, p-value = 0.9911
alternative hypothesis: true p is less than 0.5
95 percent confidence interval:
0.0000000 0.6555425
sample estimates:
p
0.59375
res <-prop.test(x =95, n =160, p =0.5, correct =FALSE)res
1-sample proportions test without continuity correction
data: 95 out of 160, null probability 0.5
X-squared = 5.625, df = 1, p-value = 0.01771
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
0.5163169 0.6667870
sample estimates:
p
0.59375
The two-proportionsz-test is used to compare two observed proportions. This article describes the basics of two-proportions *z-test and provides pratical examples using R software.
res <-prop.test(x =c(490, 400), n =c(500, 500))# Printing the resultsres
2-sample test for equality of proportions with continuity correction
data: c(490, 400) out of c(500, 500)
X-squared = 80.909, df = 1, p-value < 2.2e-16
alternative hypothesis: two.sided
95 percent confidence interval:
0.1408536 0.2191464
sample estimates:
prop 1 prop 2
0.98 0.80
What is chi-square goodness of fit test?
The chi-squaregoodness of fit test is used to compare the observed distribution to an expected distribution, in a situation where we have two or more categories in a discrete data. In other words, it compares multiple observed proportions to expected probabilities.
tulip <-c(81, 50, 27)res <-chisq.test(tulip, p =c(1/3, 1/3, 1/3))res
Chi-squared test for given probabilities
data: tulip
X-squared = 27.886, df = 2, p-value = 8.803e-07
Data format: Contingency tables
# Import the datahousetasks<-read.table("~/Penguin/housetasks.txt", header =TRUE)
`summarise()` has grouped output by 'class'. You can override using the
`.groups` argument.
class
4
5
6
8
2seater
NA
NA
NA
15.40000
compact
21.37500
21
16.92308
NA
midsize
20.50000
NA
17.78261
16.00000
minivan
18.00000
NA
15.60000
NA
pickup
16.00000
NA
14.50000
11.80000
subcompact
22.85714
20
17.00000
14.80000
suv
18.00000
NA
14.50000
12.13158
mpg%>%group_by(class)%>%summarize(n=n())%>%mutate(prop=n/sum(n))%>%# our new proportion variablekable()
class
n
prop
2seater
5
0.0213675
compact
47
0.2008547
midsize
41
0.1752137
minivan
11
0.0470085
pickup
33
0.1410256
subcompact
35
0.1495726
suv
62
0.2649573
mpg%>%group_by(class, cyl)%>%summarize(n=n())%>%mutate(prop=n/sum(n))%>%subset(select=c("class","cyl","prop"))%>%#drop the frequency valuespread(class, prop)%>%kable()
`summarise()` has grouped output by 'class'. You can override using the
`.groups` argument.
cyl
2seater
compact
midsize
minivan
pickup
subcompact
suv
4
NA
0.6808511
0.3902439
0.0909091
0.0909091
0.6000000
0.1290323
5
NA
0.0425532
NA
NA
NA
0.0571429
NA
6
NA
0.2765957
0.5609756
0.9090909
0.3030303
0.2000000
0.2580645
8
1
NA
0.0487805
NA
0.6060606
0.1428571
0.6129032
mpg%>%group_by(class)%>%summarize(n=n())%>%mutate(prop=n/sum(n))%>%# our new proportion variablekable()
class
n
prop
2seater
5
0.0213675
compact
47
0.2008547
midsize
41
0.1752137
minivan
11
0.0470085
pickup
33
0.1410256
subcompact
35
0.1495726
suv
62
0.2649573
mpg%>%group_by(class, cyl)%>%summarize(n=n())%>%mutate(prop=n/sum(n))%>%subset(select=c("class","cyl","prop"))%>%#drop the frequency valuespread(class, prop)%>%kable()
`summarise()` has grouped output by 'class'. You can override using the
`.groups` argument.
library(gplots)mpg_table <-table(mpg$manufacturer, mpg$class)balloonplot(mpg_table, main ="Balloon Plot of Manufacturer vs Car Class",xlab ="Car Class", ylab ="Manufacturer",label =TRUE, show.margins =FALSE)
Warning in chisq.test(mpg_matrix): Chi-squared approximation may be incorrect
# 1. Contingency tablempg_counts <- mpg %>%group_by(class, cyl) %>%summarise(n =n(), .groups ="drop") %>%spread(cyl, n, fill =0)# Convert to matrix format for the chi-square testmpg_matrix <-as.matrix(mpg_counts[, -1])rownames(mpg_matrix) <- mpg_counts$class
# 1. Chi-square test of the datachisq <-chisq.test(mpg_matrix)
Warning in chisq.test(mpg_matrix): Chi-squared approximation may be incorrect