Chi-square and F-test

Sameer Mathur

Tests of Independence

Chi-square test of independence

The Chi-Square test of Independence is used to determine if there is a significant relationship between two nominal (categorical) variables. The frequency of one nominal variable is compared with different values of the second nominal variable.

To produce a chisq.test() to a two-way table, we use chi-square test of independence of the row and column variables.

Tests of Independence

Chi-square test of independence

# Treatment vs Improved
library(vcd)
mytable <- xtabs(~Treatment+Improved, data=Arthritis)
chisq.test(mytable)

    Pearson's Chi-squared test

data:  mytable
X-squared = 13.055, df = 2, p-value = 0.001463

Treatment and Improved are not independent.

Tests of Independence

Chi-square test of independence

Here we found a relationship between treatment received and level of improvement because the probability is small (p<0.01) hence we reject the hypothesis that treatment type and outcome are independent.

The p values are the probability of obtaining the sampled results, assuming independence of the row and column variables in the population.

Tests of Independence

Chi-square test of independence

# Treatment vs Gender
mytable <- xtabs(~Improved+Sex, data=Arthritis)
chisq.test(mytable)

    Pearson's Chi-squared test

data:  mytable
X-squared = 4.8407, df = 2, p-value = 0.08889

Similarly, Treatment and Sex are independent i.e. there is no relationship between patient sex and improvement (p>0.05).

Tests of Independence

Fisher's exact test

Fisher's exact test is used when you have two nominal variables. A data set like this is often called an R \( \times \) C table, where R is the number of rows and C is the number of columns. Fisher's exact test is more accurate than the chi-squared test or G-test (Likelihood Ratio Test) of independence when the expected numbers are small.

Tests of Independence

Fisher's exact test

Fisher's exact test evaluates the null hypothesis of independence of rows and columns in a contingency table with fixed marginas. To produce a fisher.test()

mytable <- xtabs(~Treatment+Improved, data=Arthritis)
fisher.test(mytable)

    Fisher's Exact Test for Count Data

data:  mytable
p-value = 0.001393
alternative hypothesis: two.sided

The fisher.test() function can be applied to any two-way table with two or more rows and columns, not a 2 \( \times \) 2 table.