Exercise 5.2 The data set AirCrash in vcdExtra gives a database of all crashes of commercial airplanes between 1993-2015, classified by Phase of the flight and Cause of the crash. How can you best show is the nature of the association between these variables in a mosaic plot?

library(vcdExtra)
## Warning: package 'vcdExtra' was built under R version 3.4.4
## Loading required package: vcd
## Warning: package 'vcd' was built under R version 3.4.4
## Loading required package: grid
## Loading required package: gnm
## Warning: package 'gnm' was built under R version 3.4.4
data("AirCrash",package = "vcdExtra")
str(AirCrash)
## 'data.frame':    439 obs. of  5 variables:
##  $ Phase     : Factor w/ 5 levels "en route","landing",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ Cause     : Factor w/ 5 levels "criminal","human error",..: 1 1 1 1 4 4 4 4 4 4 ...
##  $ date      : Date, format: "1993-09-21" "1993-09-22" ...
##  $ Fatalities: int  27 108 125 112 41 19 8 22 14 43 ...
##  $ Year      : int  1993 1993 1996 2002 1993 1993 1994 2000 2002 2004 ...

Start by making a frequency table, aircrash.tab. Generate following table

aircrash.tab <- xtabs(~Phase + Cause, data= AirCrash)
aircrash.tab
##           Cause
## Phase      criminal human error mechanical unknown weather
##   en route       16          63         29      25      24
##   landing         4         114         19      18      55
##   standing        2           0          2       0       0
##   take-off        1          29         24       8       3
##   unknown         0           1          0       1       1

(b) Make a default mosaic display of the data with shade=TRUE and interpret the pattern of the high-frequency cells.

mosaic(aircrash.tab, shade=TRUE)

(c) The default plot has overlapping labels due to the uneven marginal frequencies relative to the lengths of the category labels. Use some arguments to make it more readable (Hint:Use rot_labels and alternate_labels.

fix label overlap

mosaic(aircrash.tab, shade=TRUE,labeling_args=list(rot_labels=c(30, 30, 30, 30)))

###### Reorder by Phase

phase.ord <- rev(c(3,4,1,2,5))
mosaic(aircrash.tab[phase.ord,], shade=TRUE,
labeling_args=list(rot_labels=c(30, 30, 30, 30)), offset_varnames=0.5)

##### Reorder by frequency

phase.ord <- order(rowSums(aircrash.tab), decreasing=TRUE)
cause.ord <- order(colSums(aircrash.tab), decreasing=TRUE)
mosaic(aircrash.tab[phase.ord,cause.ord], shade=TRUE,
labeling_args=list(rot_labels=c(30, 30, 30, 30)))

library(ca)
## Warning: package 'ca' was built under R version 3.4.4
aircrash.ca <- ca(aircrash.tab)
plot(aircrash.ca)

Exercise 5.7 The data set caith in MASS (Ripley, 2015) gives another classic 4×5 table tabulating hair color and eye color, this for people in Caithness, Scotland, originally from Fisher (1940). The data is stored as a data frame of cell frequencies, whose rows are eye colors and whose columns are hair colors.

data("caith", package="MASS")
caith
##        fair red medium dark black
## blue    326  38    241  110     3
## light   688 116    584  188     4
## medium  343  84    909  412    26
## dark     98  48    403  681    85

(a) The loglm() and mosaic() functions don’t understand data in this format, so use Caith <- as.matrix(caith) to convert to array form. Examine the result, and use names(dimnames(Caith))<-c() to assign appropriate names to the row and column dimensions.

Caith <- as.matrix(caith)
names(dimnames(Caith))<-c() 
Caith
##        fair red medium dark black
## blue    326  38    241  110     3
## light   688 116    584  188     4
## medium  343  84    909  412    26
## dark     98  48    403  681    85