Contingency Analysis (Part 2)

M. Drew LaMar
October 19, 2018

Class Announcements

  • Lab #5 is due on Monday at 11:59 pm!
  • HW #5 is due on Monday at 11:59 pm!
  • Exam #2 is next Friday!
    • Chapters 5-8 (Whitlock & Schluter)
    • Chapter 2 (Ruxton & Colegrave)
  • HW 1 & 2
  • See you on Slack!

Contingency analysis

Our data in this chapter consists of two categorical variables.

We are interested in:

  • Estimating association in 2 x 2 tables (i.e. special case of two categoricals with only 2 levels each)
  • Testing if there is an association (or dependence) between two categorical variables [\( \chi^2 \! \) contingency test]

Flipside of the parasitic coin

Estimation of association for 2 x 2 contingency tables: odds ratios and relative risks

What about hypothesis testing?

  • \( H_{0} \): There is no association between two categorical variables
  • \( H_{A} \): There is an association between two categorical variables

Remember: Hypothesis tests will give you a yes/no answer, but will NOT give you the magnitude of the effect if there is one (hence, always use confidence intervals as well, if possible)

\( \chi^2 \) contingency test

Example of contingency test

Example 9.4

Many parasites have more than one species of host, so the individual parasite must get from one host to another to complete its life cycle. Trematodes of the species Euhaplorchis californiensis use three hosts during their life cycle.

Example of contingency test

Example of contingency test

Example 9.4

Lafferty and Morris (1996) tested the hypothesis that infection influences risk of predation by birds. A large outdoor tank was stocked with three kinds of killfish: unparasitized, lightly infected, and heavily infected. This tank was left open to foraging by birds… The numbers of fish eaten according to their levels of parasitism is given as follows:

          Uninfected Lightly Highly Sum
Eaten              1      10     37  48
Not eaten         49      35      9  93
Sum               50      45     46 141

Example of contingency test

          Uninfected Lightly Highly Sum
Eaten              1      10     37  48
Not eaten         49      35      9  93
Sum               50      45     46 141

plot of chunk unnamed-chunk-3

Example of contingency test

If we call our two categorical variable “fate” and “infection status”, then what we want to know is are these two variables independent.

How do we compute the frequency table we would expect if the variables were independent? If two variables are independent, we have

\[ \mathrm{Pr[A \ and \ B]} = \mathrm{Pr[A]}\times\mathrm{Pr[B]} \]

Uninfected Lightly Highly Sum
Eaten Pr[Eaten and Uninfected] * * Pr[Eaten]
Not eaten * * * *
Sum Pr[Uninfected] * * *

Example of contingency test

The observed table (in relative frequencies) is given by:

Uninfected Lightly Highly Sum
Eaten Pr[Eaten and Uninfected] * * Pr[Eaten]
Not eaten * * * *
Sum Pr[Uninfected] * * *

Assuming independence, the expected (relative frequencies) table should have:

Uninfected Lightly Highly Sum
Eaten Pr[Eaten]\( \cdot \) Pr[Uninfected] * * Pr[Eaten]
Not eaten * * * *
Sum Pr[Uninfected] * * *

Example of contingency test

Uninfected Lightly Highly Sum
Eaten \( 141\cdot\frac{48}{141}\cdot\frac{50}{141} \) * * 48
Not eaten * * * 93
Sum 50 45 46 141
Uninfected Lightly Highly Sum
Eaten 17.0 \( 141\cdot\frac{48}{141}\cdot\frac{45}{141} \) * 48
Not eaten * * * 93
Sum 50 45 46 141
Uninfected Lightly Highly Sum
Eaten 17.0 15.3 \( 141\cdot\frac{48}{141}\cdot\frac{46}{141} \) 48
Not eaten * * * 93
Sum 50 45 46 141

Example of contingency test

Expected frequency table

Uninfected Lightly Highly Sum
Eaten 17.0 15.3 15.7 48
Not eaten 33.0 29.7 30.3 93
Sum 50 45 46 141

Observed frequency table

Uninfected Lightly Highly Sum
Eaten 1 10 37 48
Not eaten 49 35 9 93
Sum 50 45 46 141

Example of contingency test

The \( \chi^2 \) contingency test is just a special case of the \( \chi^2 \) goodness-of-fit test, so the test statistic is the same.

\[ \chi^2 = \sum_{i=1}^{r}\sum_{j=1}^{c} \frac{(Observed_{ij}-Expected_{ij})^2}{Expected_{ij}}, \]

where \( r \) and \( c \) are number of rows and columns, respectively. The number of degrees of freedom is given by

\[ \begin{align} df & = rc - 1 - (r-1) - (c - 1) \\ & = rc - r - c + 1 \\ & = (r-1)\times(c-1) \end{align} \]

Example of contingency test

Assumptions of the \( \chi^2 \) contingency test are the same as the \( \chi^2 \) goodness-of-fit test.

What do you do if the assumptions are violated?

  • If table larger than 2 x 2, you can combine row and/or columns.
  • If table is 2 x 2, you can use Fisher's exact test.
  • You can use a permutation test (see Chapter 13).

Example of contingency test

Expected frequency table

Uninfected Lightly Highly Sum
Eaten 17.0 15.3 15.7 48
Not eaten 33.0 29.7 30.3 93
Sum 50 45 46 141

Assumptions seem to be met, so let's use R.

Example of contingency test

First, let's see how the table was created in R:

(parTable <- matrix(c(1, 10, 37, 49, 35, 9), 
                    nrow = 2, 
                    byrow = TRUE, 
                    dimnames = list(c("Eaten", "Not eaten"), 
                                    c("Uninfected", "Lightly", "Highly"))))
          Uninfected Lightly Highly
Eaten              1      10     37
Not eaten         49      35      9

Example of contingency test

chisq.test(parTable, correct = FALSE)

Note: correct = FALSE means no Yates correction (see p. 251)

Example of contingency test


    Pearson's Chi-squared test

data:  parTable
X-squared = 69.756, df = 2, p-value = 7.124e-16

Conclusion?

Conclusion: Since \( p \)-value is less than 0.05 (actually less than 0.001), then we can reject the null hypothesis of independence.

Other type of contingency tests

Fisher’s exact test: (2 x 2 tables only) Examines the independence of two categorical variables, even with small expected values

\( G \)-test: (any table) Derived from principles of likelihood.

     Pro: Great with complicated experimental designs with multiple explanatory variables.
     Con: Can be less accurate for small sample sizes.

Magnitudes of association

Situation: You've found a statistically significant association between two categorical variables using the \( \chi^2 \) contingency test.

Question: Where is the association? In other words, in which levels of the categories is the association present and how large is the association?

We need to estimate the magnitude of the association, which the \( P \)-value does not give us!

We can estimate odds ratios or relative risks for 2 \( \times \) 2 sub-tables within the contingency table by either subsetting or collapsing.

Magnitudes of association

          Uninfected Lightly Highly
Eaten              1      10     37
Not eaten         49      35      9

plot of chunk unnamed-chunk-6

Magnitudes of association

          Uninfected Highly
Eaten              1     37
Not eaten         49      9

plot of chunk unnamed-chunk-7

Magnitudes of association

$data
          Uninfected Highly Total
Eaten              1     37    38
Not eaten         49      9    58
Total             50     46    96

$measure
                        NA
odds ratio with 95% C.I.    estimate        lower      upper
               Eaten     1.000000000           NA         NA
               Not eaten 0.004964148 0.0006020703 0.04093004

$p.value
           NA
two-sided     midp.exact fisher.exact   chi.square
  Eaten               NA           NA           NA
  Not eaten 1.110223e-16 6.861412e-17 4.140762e-15

$correction
[1] FALSE

attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"