Contingency analysis: R code for Chapter 9 examples

Note: This document was converted to R-Markdown from this page by M. Drew LaMar. You can download the R-Markdown here.

Download the R code on this page as a single file here

New methods

Hover over a function argument for a short description of its meaning. The variable names are plucked from the examples further below.

Odds ratio using the epitools package:

oddsratio(cancerTable, method = “wald”)

$\chi^2$ contingency test:

chisq.test(worm$fate, worm$infection, correct = FALSE)

Fisher’s exact test:

fisher.test(vampire$bitten, vampire$estrous)

Other new methods:

Relative risk.
G-test.
Mosaic plot for a case-control study.

Figure 9.1-1. Sex and survival on the Titanic

Mosaic plot showing association between sex and survival on the shipwrecked Titanic.

Read and inspect the data.

titanic <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter09/chap09f1.1Titanic.csv"))
head(titanic)

##   sex survival
## 1 Men Survived
## 2 Men Survived
## 3 Men Survived
## 4 Men Survived
## 5 Men Survived
## 6 Men Survived

Contingency table of the association between sex and survival. The addmargins function includes the row and columns sums with the contingency table.

titanicTable <- table(titanic$survival, titanic$sex)
addmargins(titanicTable)

##           
##             Men Women  Sum
##   Died     1329   109 1438
##   Survived  338   316  654
##   Sum      1667   425 2092

Mosaic plot of the association.

mosaicplot( t(titanicTable), col = c("firebrick", "goldenrod1"), cex.axis = 1, sub = "Sex", ylab = "Relative frequency", main = "")

Example 9.2. Aspirin and cancer

Odds ratio and relative risk to estimate association between aspirin treatment and cancer incidence in a 2x2 contingency table.

Read and inspect the data.

cancer <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter09/chap09e2AspirinCancer.csv"))
head(cancer)

##   aspirinTreatment cancer
## 1          Aspirin Cancer
## 2          Aspirin Cancer
## 3          Aspirin Cancer
## 4          Aspirin Cancer
## 5          Aspirin Cancer
## 6          Aspirin Cancer

2 x 2 contingency table showing association between aspirin treatment and cancer incidence (Table 9.2-1).

cancerTable <- table(cancer$cancer, cancer$aspirinTreatment)
cancerTable

##            
##             Aspirin Placebo
##   Cancer       1438    1427
##   No cancer   18496   18515

Mosaic plot showing the association (Figure 9.2-1).

mosaicplot( t(cancerTable), col = c("firebrick", "goldenrod1"), cex.axis = 1, sub = "Aspirin treatment", ylab = "Relative frequency", main = "")

Calculate odds ratio using the epitools package. Install the package if you haven’t already done so (this needs to be done just once per computer, after you’ve installed R). To use, load the epitools package (this needs to be done just once per R session).

if (!require("epitools")) {install.packages("epitools", dependencies = TRUE, repos="http://cran.rstudio.com/")}
library(epitools)

The formulas for the odds ratio and relative risk in the book assume that the contingency table has a layout exactly like cancerTable, namely

#            treatment control
#  sick          a        b
#  healthy       c        d

This layout works for the oddsratio function.

oddsratio(cancerTable, method = "wald")

## $data
##            
##             Aspirin Placebo Total
##   Cancer       1438    1427  2865
##   No cancer   18496   18515 37011
##   Total       19934   19942 39876
## 
## $measure
##            odds ratio with 95% C.I.
##             estimate     lower    upper
##   Cancer    1.000000        NA       NA
##   No cancer 1.008744 0.9349043 1.088415
## 
## $p.value
##            two-sided
##             midp.exact fisher.exact chi.square
##   Cancer            NA           NA         NA
##   No cancer  0.8224348    0.8310911  0.8223986
## 
## $correction
## [1] FALSE
## 
## attr(,"method")
## [1] "Unconditional MLE & normal approximation (Wald) CI"

The resulting output of oddsratio includes a lot of incidental computations. To get clean output only for the odds ratio, including the 95% confidence interval, use the following instead.

oddsratio(cancerTable, method = "wald")$measure[-1,]

##  estimate     lower     upper 
## 1.0087436 0.9349043 1.0884148

For step-by-step calculations of the odds ratio and confidence interval, we do the following:

# First assign the labels a,b,c, and d to the table, as on page 240.  
# Then plug the values into the formula for odds ratio
x = as.vector(cancerTable)
names(x) = c("a","c","b","d")
orHat = unname(x["a"] * x["d"]) / (x["b"] * x["c"])
unname(orHat)

## [1] 1.008744

#Standard error of log odds ratio
seLnOR = sqrt(1/x["a"] + 1/x["b"]+ 1/x["c"] + 1/x["d"])
unname(seLnOR)

## [1] 0.03878475

# 95% confidence interval
Z = qnorm(1 - 0.05/2) # when rounded, is 1.96
ciLnOR = c(log(orHat) - Z * seLnOR, log(orHat) + Z * seLnOR)
ciOR = exp(ciLnOR) # transform back to ratio scale
names(ciOR) = c("lower","upper")
ciOR

##     lower     upper 
## 0.9349043 1.0884148

Calculate relative risk using the epitools package. The layout expected by the riskratio function is the complete opposite of the layout used in the book. To use the command with a contingency table in book style (such as cancerTable), we need to flip (transpose) the table and reverse the order of the rows. We can do this all at once with the following arguments to the riskratio function.

library(epitools)
riskratio(t(cancerTable), rev = "both", method = "wald")

## $data
##          
##           No cancer Cancer Total
##   Placebo     18515   1427 19942
##   Aspirin     18496   1438 19934
##   Total       37011   2865 39876
## 
## $measure
##          risk ratio with 95% C.I.
##           estimate     lower   upper
##   Placebo 1.000000        NA      NA
##   Aspirin 1.008113 0.9394365 1.08181
## 
## $p.value
##          two-sided
##           midp.exact fisher.exact chi.square
##   Placebo         NA           NA         NA
##   Aspirin  0.8224348    0.8310911  0.8223986
## 
## $correction
## [1] FALSE
## 
## attr(,"method")
## [1] "Unconditional MLE & normal approximation (Wald) CI"

Notice that the result differs slightly from the relative risk value given in the book for these same data (1.007) because rounding error here is reduced.

For clean output only for the relative risk, including the 95% confidence interval, use the following command:

riskratio(t(cancerTable), method = "wald", rev = "both")$measure[-1,]

##  estimate     lower     upper 
## 1.0081129 0.9394365 1.0818098

For step-by-step calculations of the relative risk, we do the following.

x = as.vector(cancerTable)
names(x) = c("a","c","b","d")
phat1 = x["a"]/(x["a"] + x["c"])
phat2 = x["b"]/(x["b"] + x["d"])
rrHat = unname(phat1/phat2)
unname(rrHat)

## [1] 1.008113

# Standard error
seLnRR = sqrt( 1/x["a"] + 1/x["b"] - 1/(x["a"]+x["c"]) - 1/(x["b"]+x["d"]) )
unname(seLnRR)

## [1] 0.0359982

# 95% confidence interval
Z = qnorm(1 - 0.05/2) # when rounded, is 1.96
ciLnRR = c(log(rrHat) - Z * seLnRR, log(rrHat) + Z * seLnRR)
ciRR = exp(ciLnRR) # transform back to ratio scale
names(ciRR) = c("lower","upper")
ciRR

##     lower     upper 
## 0.9394365 1.0818098

Example 9.3. Toxoplasma and driving accidents

Odds ratio to estimate association between Toxoplasma infection and driving accidence in a 2 x 2 case-control study.

Read and inspect the data.

toxoplasma <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter09/chap09e3ToxoplasmaAndAccidents.csv"))
head(toxoplasma)

##   condition   accident
## 1  infected  accidents
## 2  infected  accidents
## 3  infected  accidents
## 4  infected  accidents
## 5  infected  accidents
## 6  infected  accidents

2 x 2 contingency table (Table 9.3-1).

toxTable <- table(toxoplasma$accident, toxoplasma$condition)
toxTable

##                
##                 infected uninfected
##    accidents          61        124
##    no accidents       16        169

Mosaic plot for a case-control study (Figure 9.3-1).

mosaicplot(toxTable, col=c("firebrick", "goldenrod1"), cex.axis = 1.2, sub = "Condition", dir = c("h","v"), ylab = "Relative frequency")

Odds ratio using epitools.

library(epitools)
oddsratio(toxTable, method = "wald")

## $data
##                
##                 infected uninfected Total
##    accidents          61        124   185
##    no accidents       16        169   185
##   Total               77        293   370
## 
## $measure
##                odds ratio with 95% C.I.
##                 estimate    lower    upper
##    accidents    1.000000       NA       NA
##    no accidents 5.196069 2.859352 9.442394
## 
## $p.value
##                two-sided
##                   midp.exact fisher.exact   chi.square
##    accidents              NA           NA           NA
##    no accidents 4.652168e-09 7.854937e-09 8.272554e-09
## 
## $correction
## [1] FALSE
## 
## attr(,"method")
## [1] "Unconditional MLE & normal approximation (Wald) CI"

oddsratio(toxTable, method = "wald")$measure[-1,] # clean output

## estimate    lower    upper 
## 5.196069 2.859352 9.442394

Example 9.4. Worm gets bird

$\chi^2$ contingency test to test association between trematode infection status of killifish and their fate (eaten or not eaten) in the presence of predatory birds.

Read and inspect the data.

worm <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter09/chap09e4WormGetsBird.csv"))
head(worm)

##    infection  fate
## 1 uninfected eaten
## 2    lightly eaten
## 3    lightly eaten
## 4    lightly eaten
## 5    lightly eaten
## 6    lightly eaten

Set the preferred order of infection categories in tables and graphs.

worm$infection <- factor(worm$infection, levels = c("uninfected", "lightly", "highly"))

Contingency table (Table 9.4-1 ). The addmargins function adds the row and column sums to the display of the table.

wormTable <- table(worm$fate, worm$infection)
addmargins(wormTable)

##            
##             uninfected lightly highly Sum
##   eaten              1      10     37  48
##   not eaten         49      35      9  93
##   Sum               50      45     46 141

Mosaic plot (Figure 9.4-1).

mosaicplot( t(wormTable), col = c("firebrick", "goldenrod1"), cex.axis = 1, sub = "Infection status", ylab = "Relative frequency")

$\chi^2$ contingency test. We include the argument correct = FALSE to avoid Yates’ correction. This has no effect except in 2 x 2 tables, but we keep it here for demonstration purposes.

saveChiTest <- chisq.test(worm$fate, worm$infection, correct = FALSE)
saveChiTest

## 
##  Pearson's Chi-squared test
## 
## data:  worm$fate and worm$infection
## X-squared = 69.756, df = 2, p-value = 7.124e-16

The expected frequencies under null hypothesis are included with the results, but aren’t normally shown. Type saveChiTest$expected to extract them. Include the addmargins function if you want to see the row and column sums too.

addmargins(saveChiTest$expected)

##            worm$infection
## worm$fate   uninfected  lightly   highly Sum
##   eaten       17.02128 15.31915 15.65957  48
##   not eaten   32.97872 29.68085 30.34043  93
##   Sum         50.00000 45.00000 46.00000 141

$G$-test applied to the worm-gets-bird data (Section 9.6). R has no simple, built-in function to carry out the $G$-test with goodness-of-fit data. Code for a command g.test by Brent Larget is available here. Below, we source this code, which then allows you to use his function g.test.

source("http://www.stat.wisc.edu/~st571-1/gtest.R")
g.test(wormTable)

## G-Test for Contingency Tables
## 
## Data:
##            
##             uninfected lightly highly
##   eaten              1      10     37
##   not eaten         49      35      9
## 
## The test statistic is  77.897 .
## There are  2  degrees of freedom.
## The p-value is  0 .

Example 9.5. Vampire bat feeding

Fisher’s exact test of association between estrous status of cows and whether cows were bitten by vampire bats.

Read and inspect the data.

vampire <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter09/chap09e5VampireBites.csv"))

Contingency table (Table 9.5-1).

vampireTable <- table(vampire$bitten, vampire$estrous)
vampireTable

##             
##              estrous no estrous
##   bitten          15          6
##   not bitten       7        322

Expected frequencies under null hypothesis of independence. R complains because of the violation of assumptions. Just in case you hadn’t noticed.

saveTest <- chisq.test(vampire$bitten, vampire$estrous, correct = FALSE)

## Warning in chisq.test(vampire$bitten, vampire$estrous, correct = FALSE):
## Chi-squared approximation may be incorrect

saveTest$expected

##               vampire$estrous
## vampire$bitten estrous no estrous
##     bitten        1.32      19.68
##     not bitten   20.68     308.32

Fisher’s exact test. The output includes an estimate of the odds ratio, but it uses a different method than is used in our book.

fisher.test(vampire$bitten, vampire$estrous)

## 
##  Fisher's Exact Test for Count Data
## 
## data:  vampire$bitten and vampire$estrous
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##   29.94742 457.26860
## sample estimates:
## odds ratio 
##   108.3894

Contingency analysis: R code for Chapter 9 examples

Michael Whitlock and Dolph Schluter

New methods

Figure 9.1-1. Sex and survival on the Titanic

Example 9.2. Aspirin and cancer

Example 9.3. Toxoplasma and driving accidents

Example 9.4. Worm gets bird

Example 9.5. Vampire bat feeding