library(vcd)
library(vcdExtra)
library(MASS)

1 Exercise 5.2

The data set AirCrash in vcdExtra gives a database of all crashes of commercial airplanes between 1993-2015, classified by Phase of the flight and Cause of the crash. How can you best show is the nature of the association between these variables in a mosaic plot?

1.1 Part A.

Start by making a frequency table, aircrash.tab

data("AirCrash")
aircrash.tab <- xtabs(~Phase + Cause, data= AirCrash)
aircrash.tab
##           Cause
## Phase      criminal human error mechanical unknown weather
##   en route       16          63         29      25      24
##   landing         4         114         19      18      55
##   standing        2           0          2       0       0
##   take-off        1          29         24       8       3
##   unknown         0           1          0       1       1

1.2 Part B.

Make a default mosaic display of the data with shade=TRUE and interpret the pattern of the high-frequency cells.

mosaic(aircrash.tab, shade=TRUE)

Crashes related to criminal causes have high association with the en-route and standing phases. Those due to mechanical causes happen during the take-off phase, while those due to weather causes happen during the landing phase.

1.3 Part C.

The default plot has overlapping labels due to the uneven marginal frequencies relative to the lengths of the category labels. Use some arguments to make it more readable.

mosaic(aircrash.tab, shade=TRUE, labeling = labeling_residuals, gp = shading_Friendly,
       labeling_args = list(rot_labels=c(60, 60, 60, 60)))

2 Exercise 5.7

The data set caith in MASS (Ripley, 2015) gives another classic 4×5 table tabulating hair color and eye color, this for people in Caithness, Scotland, originally from Fisher (1940). The data is stored as a data frame of cell frequencies, whose rows are eye colors and whose columns are hair colors.

data("caith", package="MASS")
caith

2.1 Part A.

The loglm() and mosaic() functions don’t understand data in this format, so use Caith <- as.matrix(caith) to convert to array form. Examine the result, and use names(dimnames(Caith))<-c() to assign appropriate names to the row and column dimensions.

Caith <- as.matrix(caith)
names(dimnames(Caith))<-c("Eye_Color", "Hair_Color") 
Caith
##          Hair_Color
## Eye_Color fair red medium dark black
##    blue    326  38    241  110     3
##    light   688 116    584  188     4
##    medium  343  84    909  412    26
##    dark     98  48    403  681    85

2.2 Part B. & Part C.

Fit the model of independence to the resulting matrix using loglm(). Calculate and display the residuals for this model.

model <- loglm(Freq ~ Eye_Color + Hair_Color, data = Caith)
anova(model)
## Call:
## loglm(formula = Freq ~ Eye_Color + Hair_Color, data = Caith)
## 
## Statistics:
##                       X^2 df P(> X^2)
## Likelihood Ratio 1218.314 12        0
## Pearson          1240.039 12        0
LRstats(model)

2.3 Part D.

Create a mosaic display for this data and explain any possible association between variables and its strength.

mosaic(Caith, shade=TRUE, gp = shading_Friendly, labeling_args = list(rot_labels=c(60, 60, 60, 60)))