pacman::p_load(tidyverse, magrittr, vcd, vcdExtra, MASS, logmult)
The data set criminal in the package logmult gives the 4 × 5 table below of the number of men aged 15–19 charged with a criminal case for whom charges were dropped in Denmark from 1955–1958.
data("criminal", package = "logmult")
criminal
## Age
## Year 15 16 17 18 19
## 1955 141 285 320 441 427
## 1956 144 292 342 441 396
## 1957 196 380 424 462 427
## 1958 212 424 399 442 430
mod <- loglm(~ Year + Age, data = criminal, fitted = TRUE)
mod
## Call:
## loglm(formula = ~Year + Age, data = criminal, fitted = TRUE)
##
## Statistics:
## X^2 df P(> X^2)
## Likelihood Ratio 38.24466 12 0.0001400372
## Pearson 38.41033 12 0.0001315495
res.p <- residuals(mod, type = "pearson")
res.p
## Age
## Year 15 16 17 18 19
## 1955 -1.44374462 -1.81254382 -1.14666244 1.51381352 2.08783700
## 1956 -1.21343368 -1.43015516 0.03293544 1.50079803 0.49761409
## 1957 0.70724720 0.44905890 1.23547617 -0.83276354 -1.16430944
## 1958 1.74098413 2.53667766 -0.20503904 -1.94497339 -1.21989280
p-value < 0.05, so we reject NULL hypothesis that there is no association between Year and Age.
mosaic(criminal, shade = TRUE, labeling = labeling_residuals)
mosaic(criminal, gp = shading_Friendly, labeling = labeling_residuals)
Bertin (1983, pp. 30–31) used a 4-way table of frequencies of traffic accident victims in France in 1958 to illustrate his scheme for classifying data sets by numerous variables, each of which could have various types and could be assigned to various visual attributes. His data are contained in Accident in vcdExtra, a frequency data frame representing his 5 × 2 × 4 × 2 table of the variables age, result (died or injured), mode of transportation, and gender.
data("Accident", package = "vcdExtra")
str(Accident, vec.len = 2)
## 'data.frame': 80 obs. of 5 variables:
## $ age : Ord.factor w/ 5 levels "0-9"<"10-19"<..: 5 5 5 5 5 ...
## $ result: Factor w/ 2 levels "Died","Injured": 1 1 1 1 1 ...
## $ mode : Factor w/ 4 levels "4-Wheeled","Bicycle",..: 4 4 2 2 3 ...
## $ gender: Factor w/ 2 levels "Female","Male": 2 1 2 1 2 ...
## $ Freq : int 704 378 396 56 742 ...
loglm(Freq ~ age + mode + gender + result, data = Accident)
## Call:
## loglm(formula = Freq ~ age + mode + gender + result, data = Accident)
##
## Statistics:
## X^2 df P(> X^2)
## Likelihood Ratio 60320.05 70 0
## Pearson 76865.31 70 0
head(Accident)
mosaic(loglm(Freq ~ age + mode + gender + result, data = Accident),labeling_args = list(abbreviate = c(mode = 1,gender = 1,result = 2)))
loglm(Freq ~ age*mode*gender+result, data = Accident)
## Call:
## loglm(formula = Freq ~ age * mode * gender + result, data = Accident)
##
## Statistics:
## X^2 df P(> X^2)
## Likelihood Ratio 2217.72 39 0
## Pearson 2347.60 39 0
mosaic(loglm(Freq ~ age * mode * gender + result, data = Accident), labeling_args = list(abbreviate = c(mode = 1,gender = 1,result = 2)))
From the plot we can see: age is 50+, mode in all for models, and gender is Male, the accident will more likely to result in death.