In general, there are no assumptions about the distribution of data for these tests.
http://rcompanion.org/rcompanion/b_07.html.
These tests for nominal variables are used to determine if two nominal variables are associated. Sometimes the term “independent” is used to mean that there is no association.
Note that for these tests of association there shouldn’t be paired values. For example, if experimental units-the things you are counting-are “students before” and “students after”, or “left hands” and “right hands”.
Interpretation Significant results can be reported as “There was a significant association between variable A and variable B.”
Agresti (1990, p. 61f; 2002, p. 91) Fisher’s Tea Drinker A British woman claimed to be able to distinguish whether milk or tea was added to the cup first. To test, she was given 8 cups of tea, in four of which milk was added first. The null hypothesis is that there is no association between the true order of pouring and the woman’s guess, the alternative that there is a positive association (that the odds ratio is greater than 1).
Among 4 milks, the women guess right 3 times, Among 4 teas, the women also guess right 3 times, Is this a wild guess?
library(DescTools)
library(multcompView)
library(rcompanion)
TeaTasting <-
matrix(c(3, 1, 1, 3),
nrow = 2,
dimnames = list(Guess = c("Milk", "Tea"),
Truth = c("Milk", "Tea")))
TeaTasting
## Truth
## Guess Milk Tea
## Milk 3 1
## Tea 1 3
out <- fisher.test(TeaTasting)
if (out$p.value > 0.05) print("NO! There is no association between the two variables") else print("YES! There is an association between the two variables")
## [1] "NO! There is no association between the two variables"
In the above test, p = 0.2429, association could not be established, the women is just guessing.
Alexander Anderson runs the pesticide safety training course in four counties. Students must pass in order to obtain their pesticide applicator’s license. He wishes to see if there is an association between the county in which the course was held and the rate of passing the test. The following are his data. County Pass Fail Bloom County 21 5 Cobblestone County 6 11 Dougal County 7 8 Heimlich County 27 5
# Reading the data as a matrix
Input =("
County Pass Fail
Bloom 21 5
Cobblestone 6 11
Dougal 7 8
Heimlich 27 5
")
Matrix = as.matrix(read.table(textConnection(Input),
header=TRUE,
row.names=1))
Matrix
## Pass Fail
## Bloom 21 5
## Cobblestone 6 11
## Dougal 7 8
## Heimlich 27 5
Hypotheses . Null hypothesis(p > 0.05): There is no association between the two variables. . Alternative hypothesis(two-sided)(p < 0.05) : There is an association between the two variables.
out <- fisher.test(Matrix)
if (out$p.value > 0.05) print("NO! There is no association between the two variables") else print("YES! There is an association between the two variables")
## [1] "YES! There is an association between the two variables"
Post-hoc analysis can be conducted with pairwise Fisher’s exact tests. The function pairwiseNominalIndependence in the rcompanionpackage can be used to conduct this analysis.
### Order matrix
Matrix = Matrix[(c("Heimlich", "Bloom", "Dougal", "Cobblestone")),]
Matrix
## Pass Fail
## Heimlich 27 5
## Bloom 21 5
## Dougal 7 8
## Cobblestone 6 11
### Pairwise tests of association
PT = pairwiseNominalIndependence(Matrix,
compare = "row",
fisher = TRUE,
gtest = FALSE,
chisq = FALSE,
method = "fdr", # see ?p.adjust for options
digits = 3)
PT
## Comparison p.Fisher p.adj.Fisher
## 1 Heimlich : Bloom 0.740000 0.74000
## 2 Heimlich : Dougal 0.013100 0.02620
## 3 Heimlich : Cobblestone 0.000994 0.00596
## 4 Bloom : Dougal 0.037600 0.05640
## 5 Bloom : Cobblestone 0.003960 0.01190
## 6 Dougal : Cobblestone 0.720000 0.74000
### Compact letter display
cldList(p.adj.Fisher ~ Comparison,
data = PT,
threshold = 0.05)
## Group Letter MonoLetter
## 1 Heimlich a a
## 2 Bloom ab ab
## 3 Dougal bc bc
## 4 Cobblestone c c
### This table of letters can also be found using the pairwiseNominalMatrix function along with the multcompLetters function in the multcompView package.
### Order matrix
Matrix = Matrix[(c("Heimlich", "Bloom", "Dougal", "Cobblestone")),]
Matrix
## Pass Fail
## Heimlich 27 5
## Bloom 21 5
## Dougal 7 8
## Cobblestone 6 11
Counties sharing a letter are “NOT” significantly different by Fisher exact test
PM = pairwiseNominalMatrix(Matrix,
compare = "row",
fisher = TRUE,
gtest = FALSE,
chisq = FALSE,
method = "fdr", # see ?p.adjust for options
digits = 3)
PM
## $Test
## [1] "Fisher exact test"
##
## $Unadjusted
## Heimlich Bloom Dougal Cobblestone
## Heimlich NA 0.74 0.0131 0.000994
## Bloom NA NA 0.0376 0.003960
## Dougal NA NA NA 0.720000
## Cobblestone NA NA NA NA
##
## $Method
## [1] "fdr"
##
## $Adjusted
## Heimlich Bloom Dougal Cobblestone
## Heimlich 1.00000 0.7400 0.0262 0.00596
## Bloom 0.74000 1.0000 0.0564 0.01190
## Dougal 0.02620 0.0564 1.0000 0.74000
## Cobblestone 0.00596 0.0119 0.7400 1.00000
multcompLetters(PM$Adjusted,
compare="<",
threshold=0.05, ### p-value to use as significance threshold
Letters=letters,
reversed = FALSE)
## Heimlich Bloom Dougal Cobblestone
## "a" "ab" "bc" "c"