Exercise 6.2 The data set criminal in the package logmult gives the 4 × 5 table below of the number of men aged 15–19 charged with a criminal case for whom charges were dropped in Denmark from 1955–1958.
- What percentages of the Pearson χ2 for association are explained by the various dimensions?
library(ca)
library(logmult)
## Loading required package: gnm
##
## Attaching package: 'logmult'
## The following object is masked from 'package:gnm':
##
## se
data("criminal", package ="logmult")
str(criminal)
## table [1:4, 1:5] 141 144 196 212 285 292 380 424 320 342 ...
## - attr(*, "dimnames")=List of 2
## ..$ Year: chr [1:4] "1955" "1956" "1957" "1958"
## ..$ Age : chr [1:5] "15" "16" "17" "18" ...
#It can be seen that 90.3% of the Pearson X2 for this model is accounted for in the first dimension, 9.0% by the second and 0.7% by the third.
criminalca <- ca(criminal)
summary(criminalca)
##
## Principal inertias (eigenvalues):
##
## dim value % cum% scree plot
## 1 0.004939 90.3 90.3 ***********************
## 2 0.000491 9.0 99.3 **
## 3 3.8e-050 0.7 100.0
## -------- -----
## Total: 0.005468 100.0
##
##
## Rows:
## name mass qlt inr k=1 cor ctr k=2 cor ctr
## 1 | 1955 | 230 996 347 | 88 939 361 | -22 58 223 |
## 2 | 1956 | 230 978 157 | 58 908 157 | 16 71 124 |
## 3 | 1957 | 269 984 111 | -39 669 82 | 27 315 391 |
## 4 | 1958 | 271 999 385 | -85 938 399 | -22 61 262 |
##
## Columns:
## name mass qlt inr k=1 cor ctr k=2 cor ctr
## 1 | 15 | 99 998 185 | -101 992 203 | -7 5 11 |
## 2 | 16 | 197 996 312 | -91 959 331 | -18 37 128 |
## 3 | 17 | 211 991 75 | -23 281 23 | 37 710 594 |
## 4 | 18 | 254 989 235 | 70 980 255 | 7 9 24 |
## 5 | 19 | 239 990 194 | 62 877 188 | -22 112 243 |
- Plot the 2D correspondence analysis solution. Describe the pattern of association between year and age.
# We see a positive association between year and age.
plot(criminalca)

Exercise 6.11 The data set Vietnam in vcdExtra gives a 2 × 5 × 4 contingency table in frequency form reflecting a survey of student opinion on the Vietnam War at the University of North Carolina in May 1967. The table variables are sex, year in school, and response, which has categories: (A) Defeat North Vietnam by widespread bombing and land invasion; (B) Maintain the present policy; (C) De-escalate military activity, stop bombing and begin negotiations; (D) Withdraw military forces immediately.
- Using the stacking approach, carry out a correspondence analysis corresponding to the loglinear model [R][YS], which asserts that the response is independent of the combinations of year an sex.
library(vcdExtra)
## Loading required package: vcd
## Loading required package: grid
##
## Attaching package: 'vcd'
## The following object is masked from 'package:logmult':
##
## assoc
data("Vietman", package="vcdExtra")
## Warning in data("Vietman", package = "vcdExtra"): data set 'Vietman' not
## found
str(Vietnam)
## 'data.frame': 40 obs. of 4 variables:
## $ sex : Factor w/ 2 levels "Female","Male": 1 1 1 1 1 1 1 1 1 1 ...
## $ year : int 1 1 1 1 2 2 2 2 3 3 ...
## $ response: Factor w/ 4 levels "A","B","C","D": 1 2 3 4 1 2 3 4 1 2 ...
## $ Freq : int 13 19 40 5 5 9 33 3 22 29 ...
vietman <- within(Vietnam, {year_sex <- paste(year, toupper(substr(sex,1,1)))})
vietnam.tab <- xtabs(Freq ~ year_sex + response, data=vietman)
vietnamca<-ca(vietnam.tab)
summary(vietnamca)
##
## Principal inertias (eigenvalues):
##
## dim value % cum% scree plot
## 1 0.085680 73.6 73.6 ******************
## 2 0.027881 23.9 97.5 ******
## 3 0.002854 2.5 100.0 *
## -------- -----
## Total: 0.116415 100.0
##
##
## Rows:
## name mass qlt inr k=1 cor ctr k=2 cor ctr
## 1 | 1F | 24 818 13 | -167 452 8 | -150 367 20 |
## 2 | 1M | 139 997 181 | 386 986 242 | -41 11 8 |
## 3 | 2F | 16 995 35 | -407 647 31 | -299 349 51 |
## 4 | 2M | 140 984 131 | 326 982 175 | -15 2 1 |
## 5 | 3F | 53 999 112 | -334 453 69 | -367 547 256 |
## 6 | 3M | 138 904 40 | 175 904 49 | -4 0 0 |
## 7 | 4F | 32 982 37 | -344 887 44 | -113 95 15 |
## 8 | 4M | 149 383 23 | 81 372 11 | 14 11 1 |
## 9 | 5F | 59 994 153 | -453 686 143 | -304 309 197 |
## 10 | 5M | 248 1000 276 | -281 608 228 | 225 391 451 |
##
## Columns:
## name mass qlt inr k=1 cor ctr k=2 cor ctr
## 1 | A | 255 985 381 | 414 985 509 | -1 0 0 |
## 2 | B | 235 720 60 | 135 608 50 | 58 112 28 |
## 3 | C | 419 999 283 | -247 773 298 | -133 226 267 |
## 4 | D | 92 995 276 | -366 383 143 | 463 612 705 |
- Construct an informative 2D plot of the solution, and interpret in terms of how the response varies with year for males and females.
# Year 5 males are more likely to have response D
# Year 1 and 4 females are more likely to have responce C.
# Year 3 and 4 males are more likely to have response B.
# Year 1 and 2 males are more likely to have response A.
plot(vietnamca)

- Use mjca () to carry out an MCA on the three-way table. Make a useful plot of the solution and interpret in terms of the relationship of the response to year and sex.
# Result is similar to the analysis in question b.
# Year 5 males are more likely to have response D
# Year 1 and 4 females are more likely to have responce C.
# Year 3 and 4 males are more likely to have response B.
# Year 1 and 2 males are more likely to have response A.
vietnammjca <-mjca(vietnam.tab)
plot(vietnammjca)
