Tests of Independence

Chi-square test of independence

The Chi-Square test of Independence is used to determine if there is a significant relationship between two nominal (categorical) variables. The frequency of one nominal variable is compared with different values of the second nominal variable.

To produce a chisq.test() to a two-way table, we use chi-square test of independence of the row and column variables.

Treatment vs Improved

# Treatment vs Improved
library(vcd)
mytable <- xtabs(~Treatment+Improved, data=Arthritis)
chisq.test(mytable)
## 
##  Pearson's Chi-squared test
## 
## data:  mytable
## X-squared = 13.055, df = 2, p-value = 0.001463

Treatment and Improved are not independent.

Here we found a relationship between treatment received and level of improvement because the probability is small (p<0.01) hence we reject the hypothesis that treatment type and outcome are independent.

The p-values are the probability of obtaining the sampled results, assuming independence of the row and column variables in the population.

Treatment vs Gender

# Treatment vs Gender
mytable <- xtabs(~Improved+Sex, data=Arthritis)
chisq.test(mytable)
## Warning in chisq.test(mytable): Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  mytable
## X-squared = 4.8407, df = 2, p-value = 0.08889

Similarly, Treatment and Sex are independent i.e. there is no relationship between patient sex and improvement (p>0.05).

Fisher’s exact test

Fisher’s exact test is used when you have two nominal variables. A data set like this is often called an R \(\times\) C table, where R is the number of rows and C is the number of columns. Fisher’s exact test is more accurate than the chi-squared test or G-test (Likelihood Ratio Test) of independence when the expected numbers are small.

Fisher’s exact test evaluates the null hypothesis of independence of rows and columns in a contingency table with fixed marginas. To produce a fisher.test()

mytable <- xtabs(~Treatment+Improved, data=Arthritis)
fisher.test(mytable)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  mytable
## p-value = 0.001393
## alternative hypothesis: two.sided

The fisher.test() function can be applied to any two-way table with two or more rows and columns, not a 2 \(\times\) 2 table.