library(vcd)
library(vcdExtra)
library(MASS)The data set AirCrash in vcdExtra gives a database of all crashes of commercial airplanes between 1993-2015, classified by Phase of the flight and Cause of the crash. How can you best show is the nature of the association between these variables in a mosaic plot?
Start by making a frequency table, aircrash.tab
data("AirCrash")
aircrash.tab <- xtabs(~Phase + Cause, data= AirCrash)
aircrash.tab## Cause
## Phase criminal human error mechanical unknown weather
## en route 16 63 29 25 24
## landing 4 114 19 18 55
## standing 2 0 2 0 0
## take-off 1 29 24 8 3
## unknown 0 1 0 1 1
Make a default mosaic display of the data with shade=TRUE and interpret the pattern of the high-frequency cells.
mosaic(aircrash.tab, shade=TRUE)Crashes related to criminal causes have high association with the en-route and standing phases. Those due to mechanical causes happen during the take-off phase, while those due to weather causes happen during the landing phase.
The default plot has overlapping labels due to the uneven marginal frequencies relative to the lengths of the category labels. Use some arguments to make it more readable.
mosaic(aircrash.tab, shade=TRUE, labeling = labeling_residuals, gp = shading_Friendly,
labeling_args = list(rot_labels=c(60, 60, 60, 60)))The data set caith in MASS (Ripley, 2015) gives another classic 4×5 table tabulating hair color and eye color, this for people in Caithness, Scotland, originally from Fisher (1940). The data is stored as a data frame of cell frequencies, whose rows are eye colors and whose columns are hair colors.
data("caith", package="MASS")
caithThe loglm() and mosaic() functions don’t understand data in this format, so use Caith <- as.matrix(caith) to convert to array form. Examine the result, and use names(dimnames(Caith))<-c() to assign appropriate names to the row and column dimensions.
Caith <- as.matrix(caith)
names(dimnames(Caith))<-c("Eye_Color", "Hair_Color")
Caith## Hair_Color
## Eye_Color fair red medium dark black
## blue 326 38 241 110 3
## light 688 116 584 188 4
## medium 343 84 909 412 26
## dark 98 48 403 681 85
Fit the model of independence to the resulting matrix using loglm(). Calculate and display the residuals for this model.
model <- loglm(Freq ~ Eye_Color + Hair_Color, data = Caith)
anova(model)## Call:
## loglm(formula = Freq ~ Eye_Color + Hair_Color, data = Caith)
##
## Statistics:
## X^2 df P(> X^2)
## Likelihood Ratio 1218.314 12 0
## Pearson 1240.039 12 0
LRstats(model)Create a mosaic display for this data and explain any possible association between variables and its strength.
mosaic(Caith, shade=TRUE, gp = shading_Friendly, labeling_args = list(rot_labels=c(60, 60, 60, 60)))