Exercise 5.1 The data set criminal in the package logmult gives the 4 × 5 table below of the number of men aged 15-19 charged with a criminal case for whom charges were dropped in Denmark from 1955-1958.
library(ca)
## Warning: package 'ca' was built under R version 3.4.4
library(vcd)
## Warning: package 'vcd' was built under R version 3.4.4
## Loading required package: grid
library(vcdExtra)
## Warning: package 'vcdExtra' was built under R version 3.4.4
## Loading required package: gnm
## Warning: package 'gnm' was built under R version 3.4.4
library(MASS)
library(logmult)
## Warning: package 'logmult' was built under R version 3.4.4
##
## Attaching package: 'logmult'
## The following object is masked from 'package:gnm':
##
## se
## The following object is masked from 'package:vcd':
##
## assoc
data("criminal", package="logmult")
criminal
## Age
## Year 15 16 17 18 19
## 1955 141 285 320 441 427
## 1956 144 292 342 441 396
## 1957 196 380 424 462 427
## 1958 212 424 399 442 430
c.lm<-loglm(Year ~ Age, data=criminal, fitted=TRUE)
summary(c.lm)
## Formula:
## Year ~ Age
## attr(,"variables")
## list(Year, Age)
## attr(,"factors")
## Age
## Year 0
## Age 1
## attr(,"term.labels")
## [1] "Age"
## attr(,"order")
## [1] 1
## attr(,"intercept")
## [1] 1
## attr(,"response")
## [1] 1
## attr(,".Environment")
## <environment: R_GlobalEnv>
##
## Statistics:
## X^2 df P(> X^2)
## Likelihood Ratio 84.14370 15 1.210687e-11
## Pearson 84.29411 15 1.135747e-11
a<-residuals(c.lm)
a
## Age
## Year 15 16 17 18 19
## 1955 -2.5327533 -3.3444912 -2.7248845 -0.2608238 0.3406228
## 1956 -2.2896215 -2.9446979 -1.5386917 -0.2608238 -1.1825075
## 1957 1.6925059 1.8400733 2.6764500 0.7293514 0.3406228
## 1958 2.8433554 4.0907254 1.4228176 -0.2133211 0.4860327
sum(a^2)
## [1] 84.1437
As we can see, there is association between year and Age. In 1955, age 19 was high criminal age gradually, it becomes younger, in 1958, the age was 16.
library(mosaic)
## Warning: package 'mosaic' was built under R version 3.4.4
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:MASS':
##
## select
## The following object is masked from 'package:vcdExtra':
##
## summarise
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Loading required package: lattice
##
## Attaching package: 'lattice'
## The following object is masked from 'package:gnm':
##
## barley
## Loading required package: ggformula
## Warning: package 'ggformula' was built under R version 3.4.4
## Loading required package: ggplot2
##
## New to ggformula? Try the tutorials:
## learnr::run_tutorial("introduction", package = "ggformula")
## learnr::run_tutorial("refining", package = "ggformula")
## Loading required package: mosaicData
## Warning: package 'mosaicData' was built under R version 3.4.4
## Loading required package: Matrix
##
## The 'mosaic' package masks several functions from core packages in order to add
## additional features. The original behavior of these functions should not be affected by this.
##
## Note: If you use the Matrix package, be sure to load it BEFORE loading mosaic.
##
## Attaching package: 'mosaic'
## The following object is masked from 'package:Matrix':
##
## mean
## The following objects are masked from 'package:dplyr':
##
## count, do, tally
## The following object is masked from 'package:vcd':
##
## mplot
## The following objects are masked from 'package:stats':
##
## binom.test, cor, cor.test, cov, fivenum, IQR, median,
## prop.test, quantile, sd, t.test, var
## The following objects are masked from 'package:base':
##
## max, mean, min, prod, range, sample, sum
m1 <- mosaicplot(criminal, shade=TRUE)
m2 <- mosaicplot(criminal, gp=shading_Friendly)
## Warning: In mosaicplot.default(criminal, gp = shading_Friendly) :
## extra argument 'gp' will be disregarded
m1
## NULL
m2
## NULL
it shows the Age and Year has an association relationship. in 1955, 19 years old teen dominates the crime, in 1958, 16 years old teen commit more crimes.
Exercise 5.9 Bertin (1983, pp. 30-31) used a 4-way table of frequencies of traffic accident victims in France in 1958 to illustrate his scheme for classifying data sets by numerous variables, each of which could have various types and could be assigned to various visual attributes. His data are contained in Accident in vcdExtra, a frequency data frame representing his 5 × 2 × 4 × 2 table of the variables age, result (died or injured), mode of transportation, and gender.
data("Accident", package="vcdExtra")
str(Accident, vec.len=2)
## 'data.frame': 80 obs. of 5 variables:
## $ age : Ord.factor w/ 5 levels "0-9"<"10-19"<..: 5 5 5 5 5 ...
## $ result: Factor w/ 2 levels "Died","Injured": 1 1 1 1 1 ...
## $ mode : Factor w/ 4 levels "4-Wheeled","Bicycle",..: 4 4 2 2 3 ...
## $ gender: Factor w/ 2 levels "Female","Male": 2 1 2 1 2 ...
## $ Freq : int 704 378 396 56 742 ...
f.lm <- loglm(Freq ~ age + mode+ gender+result, data=Accident)
summary(f.lm)
## Formula:
## Freq ~ age + mode + gender + result
## attr(,"variables")
## list(Freq, age, mode, gender, result)
## attr(,"factors")
## age mode gender result
## Freq 0 0 0 0
## age 1 0 0 0
## mode 0 1 0 0
## gender 0 0 1 0
## result 0 0 0 1
## attr(,"term.labels")
## [1] "age" "mode" "gender" "result"
## attr(,"order")
## [1] 1 1 1 1
## attr(,"intercept")
## [1] 1
## attr(,"response")
## [1] 1
## attr(,".Environment")
## <environment: R_GlobalEnv>
## attr(,"predvars")
## list(Freq, age, mode, gender, result)
## attr(,"dataClasses")
## Freq age mode gender result
## "numeric" "ordered" "factor" "factor" "factor"
##
## Statistics:
## X^2 df P(> X^2)
## Likelihood Ratio 60320.05 70 0
## Pearson 76865.31 70 0
f.m1<-mosaic(f.lm , abbreviate_labs=TRUE, clip=FALSE)
f.lm2 <- loglm(Freq ~ mode+gender+age+result, data=Accident)
f.m2 <- mosaic(f.lm2, abbreviate_labs=TRUE)
f.lm3 <- loglm(Freq ~ gender+result+age+mode, data=Accident)
f.m3 <- mosaic(f.lm3, abbreviate_labs=TRUE)
f.lm4 <- loglm(Freq ~ mode+result+gender+age, data=Accident)
f.m4 <- mosaic(f.lm4, abbreviate_labs=TRUE)
f.lm5 <- loglm(Freq ~ age*mode*gender+result, data=Accident)
summary(f.lm2)
## Formula:
## Freq ~ mode + gender + age + result
## attr(,"variables")
## list(Freq, mode, gender, age, result)
## attr(,"factors")
## mode gender age result
## Freq 0 0 0 0
## mode 1 0 0 0
## gender 0 1 0 0
## age 0 0 1 0
## result 0 0 0 1
## attr(,"term.labels")
## [1] "mode" "gender" "age" "result"
## attr(,"order")
## [1] 1 1 1 1
## attr(,"intercept")
## [1] 1
## attr(,"response")
## [1] 1
## attr(,".Environment")
## <environment: R_GlobalEnv>
## attr(,"predvars")
## list(Freq, mode, gender, age, result)
## attr(,"dataClasses")
## Freq mode gender age result
## "numeric" "factor" "factor" "ordered" "factor"
##
## Statistics:
## X^2 df P(> X^2)
## Likelihood Ratio 60320.05 70 0
## Pearson 76865.31 70 0
f.m5 <- mosaic(f.lm5, abbreviate_labs=TRUE)
From the graph we can see the Male, with age greater than 50 has a higher rate of deather, also, Felmale of age 10-19 years old with bicycle has a high rate of death.
Accident$mode <- ordered(Accident$mode, levels=levels(Accident$mode)[c(4,2,3,1)])
mosaic(f.lm, shade=TRUE, rot_labels = c(20, 90, 00, 90))