Association test for Nominal Variables

Fisher’s Tea Drinker Example
Fisher Exact test of association
Post-hoc analysis
Pairwise tests of association

Fisher’s Tea Drinker Example

In general, there are no assumptions about the distribution of data for these tests.

http://rcompanion.org/rcompanion/b_07.html.

These tests for nominal variables are used to determine if two nominal variables are associated. Sometimes the term “independent” is used to mean that there is no association.

Note that for these tests of association there shouldn’t be paired values. For example, if experimental units-the things you are counting-are “students before” and “students after”, or “left hands” and “right hands”.

Interpretation Significant results can be reported as “There was a significant association between variable A and variable B.”

Agresti (1990, p. 61f; 2002, p. 91) Fisher’s Tea Drinker A British woman claimed to be able to distinguish whether milk or tea was added to the cup first. To test, she was given 8 cups of tea, in four of which milk was added first. The null hypothesis is that there is no association between the true order of pouring and the woman’s guess, the alternative that there is a positive association (that the odds ratio is greater than 1).

Among 4 milks, the women guess right 3 times, Among 4 teas, the women also guess right 3 times, Is this a wild guess?

library(DescTools)
library(multcompView)
library(rcompanion)


TeaTasting <-
matrix(c(3, 1, 1, 3),
       nrow = 2,
       dimnames = list(Guess = c("Milk", "Tea"),
                       Truth = c("Milk", "Tea")))
TeaTasting

##       Truth
## Guess  Milk Tea
##   Milk    3   1
##   Tea     1   3

out <- fisher.test(TeaTasting)

if (out$p.value > 0.05) print("NO! There is no association between the two variables") else print("YES! There is an association between the two variables")

## [1] "NO! There is no association between the two variables"

In the above test, p = 0.2429, association could not be established, the women is just guessing.

Alexander Anderson runs the pesticide safety training course in four counties. Students must pass in order to obtain their pesticide applicator’s license. He wishes to see if there is an association between the county in which the course was held and the rate of passing the test. The following are his data. County Pass Fail Bloom County 21 5 Cobblestone County 6 11 Dougal County 7 8 Heimlich County 27 5

# Reading the data as a matrix

Input =("
County               Pass   Fail
        Bloom                21      5
        Cobblestone           6     11
        Dougal                7      8
        Heimlich             27      5
        ")

Matrix = as.matrix(read.table(textConnection(Input),
                              header=TRUE, 
                              row.names=1))

Matrix

##             Pass Fail
## Bloom         21    5
## Cobblestone    6   11
## Dougal         7    8
## Heimlich      27    5

Fisher Exact test of association

Hypotheses . Null hypothesis(p > 0.05): There is no association between the two variables. . Alternative hypothesis(two-sided)(p < 0.05) : There is an association between the two variables.

out <- fisher.test(Matrix)

if (out$p.value > 0.05) print("NO! There is no association between the two variables") else print("YES! There is an association between the two variables")

## [1] "YES! There is an association between the two variables"

Post-hoc analysis

Post-hoc analysis can be conducted with pairwise Fisher’s exact tests. The function pairwiseNominalIndependence in the rcompanionpackage can be used to conduct this analysis.

### Order matrix

Matrix = Matrix[(c("Heimlich", "Bloom", "Dougal", "Cobblestone")),]

Matrix

##             Pass Fail
## Heimlich      27    5
## Bloom         21    5
## Dougal         7    8
## Cobblestone    6   11

### Pairwise tests of association 



PT = pairwiseNominalIndependence(Matrix,
                                 compare = "row",
                                 fisher  = TRUE,
                                 gtest   = FALSE,
                                 chisq   = FALSE,
                                 method  = "fdr",  # see ?p.adjust for options
                                 digits  = 3)

PT

##               Comparison p.Fisher p.adj.Fisher
## 1       Heimlich : Bloom 0.740000      0.74000
## 2      Heimlich : Dougal 0.013100      0.02620
## 3 Heimlich : Cobblestone 0.000994      0.00596
## 4         Bloom : Dougal 0.037600      0.05640
## 5    Bloom : Cobblestone 0.003960      0.01190
## 6   Dougal : Cobblestone 0.720000      0.74000

### Compact letter display


cldList(p.adj.Fisher ~ Comparison,
        data       = PT,
        threshold  = 0.05)

##         Group Letter MonoLetter
## 1    Heimlich      a        a  
## 2       Bloom     ab        ab 
## 3      Dougal     bc         bc
## 4 Cobblestone      c          c

### This table of letters can also be found using the pairwiseNominalMatrix function along with the multcompLetters function in the multcompView package.

### Order matrix

Matrix = Matrix[(c("Heimlich", "Bloom", "Dougal", "Cobblestone")),]

Matrix

##             Pass Fail
## Heimlich      27    5
## Bloom         21    5
## Dougal         7    8
## Cobblestone    6   11

Pairwise tests of association

Counties sharing a letter are “NOT” significantly different by Fisher exact test

PM = pairwiseNominalMatrix(Matrix,
                           compare = "row",
                           fisher  = TRUE,
                           gtest   = FALSE,
                           chisq   = FALSE, 
                           method  = "fdr",  # see ?p.adjust for options
                           digits  = 3)
PM

## $Test
## [1] "Fisher exact test"
## 
## $Unadjusted
##             Heimlich Bloom Dougal Cobblestone
## Heimlich          NA  0.74 0.0131    0.000994
## Bloom             NA    NA 0.0376    0.003960
## Dougal            NA    NA     NA    0.720000
## Cobblestone       NA    NA     NA          NA
## 
## $Method
## [1] "fdr"
## 
## $Adjusted
##             Heimlich  Bloom Dougal Cobblestone
## Heimlich     1.00000 0.7400 0.0262     0.00596
## Bloom        0.74000 1.0000 0.0564     0.01190
## Dougal       0.02620 0.0564 1.0000     0.74000
## Cobblestone  0.00596 0.0119 0.7400     1.00000

multcompLetters(PM$Adjusted,   
                compare="<",   
                threshold=0.05,  ### p-value to use as significance threshold   
                Letters=letters,   
                reversed = FALSE)

##    Heimlich       Bloom      Dougal Cobblestone 
##         "a"        "ab"        "bc"         "c"

Association test for Nominal Variables

Janpu Hou

August 17, 2017

Fisher’s Tea Drinker Example

Fisher Exact test of association

Post-hoc analysis

Pairwise tests of association