Exercise 6.2 The data set criminal in the package logmult gives the 4 × 5 table below of the number of men aged 15–19 charged with a criminal case for whom charges were dropped in Denmark from 1955–1958.

  1. What percentages of the Pearson χ2 for association are explained by the various dimensions?
library(ca)
library(logmult)
## Loading required package: gnm
## 
## Attaching package: 'logmult'
## The following object is masked from 'package:gnm':
## 
##     se
data("criminal", package ="logmult")
str(criminal)
##  table [1:4, 1:5] 141 144 196 212 285 292 380 424 320 342 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ Year: chr [1:4] "1955" "1956" "1957" "1958"
##   ..$ Age : chr [1:5] "15" "16" "17" "18" ...
#It can be seen that 90.3% of the Pearson X2 for this model is accounted for in the first dimension, 9.0% by the second and 0.7% by the third.
criminalca <- ca(criminal)
summary(criminalca)
## 
## Principal inertias (eigenvalues):
## 
##  dim    value      %   cum%   scree plot               
##  1      0.004939  90.3  90.3  ***********************  
##  2      0.000491   9.0  99.3  **                       
##  3      3.8e-050   0.7 100.0                           
##         -------- -----                                 
##  Total: 0.005468 100.0                                 
## 
## 
## Rows:
##     name   mass  qlt  inr    k=1 cor ctr    k=2 cor ctr  
## 1 | 1955 |  230  996  347 |   88 939 361 |  -22  58 223 |
## 2 | 1956 |  230  978  157 |   58 908 157 |   16  71 124 |
## 3 | 1957 |  269  984  111 |  -39 669  82 |   27 315 391 |
## 4 | 1958 |  271  999  385 |  -85 938 399 |  -22  61 262 |
## 
## Columns:
##     name   mass  qlt  inr    k=1 cor ctr    k=2 cor ctr  
## 1 |   15 |   99  998  185 | -101 992 203 |   -7   5  11 |
## 2 |   16 |  197  996  312 |  -91 959 331 |  -18  37 128 |
## 3 |   17 |  211  991   75 |  -23 281  23 |   37 710 594 |
## 4 |   18 |  254  989  235 |   70 980 255 |    7   9  24 |
## 5 |   19 |  239  990  194 |   62 877 188 |  -22 112 243 |
  1. Plot the 2D correspondence analysis solution. Describe the pattern of association between year and age.
# We see a positive association between year and age.
plot(criminalca)

Exercise 6.11 The data set Vietnam in vcdExtra gives a 2 × 5 × 4 contingency table in frequency form reflecting a survey of student opinion on the Vietnam War at the University of North Carolina in May 1967. The table variables are sex, year in school, and response, which has categories: (A) Defeat North Vietnam by widespread bombing and land invasion; (B) Maintain the present policy; (C) De-escalate military activity, stop bombing and begin negotiations; (D) Withdraw military forces immediately.

  1. Using the stacking approach, carry out a correspondence analysis corresponding to the loglinear model [R][YS], which asserts that the response is independent of the combinations of year an sex.
library(vcdExtra)
## Loading required package: vcd
## Loading required package: grid
## 
## Attaching package: 'vcd'
## The following object is masked from 'package:logmult':
## 
##     assoc
data("Vietman", package="vcdExtra")
## Warning in data("Vietman", package = "vcdExtra"): data set 'Vietman' not
## found
str(Vietnam)
## 'data.frame':    40 obs. of  4 variables:
##  $ sex     : Factor w/ 2 levels "Female","Male": 1 1 1 1 1 1 1 1 1 1 ...
##  $ year    : int  1 1 1 1 2 2 2 2 3 3 ...
##  $ response: Factor w/ 4 levels "A","B","C","D": 1 2 3 4 1 2 3 4 1 2 ...
##  $ Freq    : int  13 19 40 5 5 9 33 3 22 29 ...
vietman <- within(Vietnam, {year_sex <- paste(year, toupper(substr(sex,1,1)))})
vietnam.tab <- xtabs(Freq ~ year_sex + response, data=vietman)
vietnamca<-ca(vietnam.tab)
summary(vietnamca)
## 
## Principal inertias (eigenvalues):
## 
##  dim    value      %   cum%   scree plot               
##  1      0.085680  73.6  73.6  ******************       
##  2      0.027881  23.9  97.5  ******                   
##  3      0.002854   2.5 100.0  *                        
##         -------- -----                                 
##  Total: 0.116415 100.0                                 
## 
## 
## Rows:
##      name   mass  qlt  inr    k=1 cor ctr    k=2 cor ctr  
## 1  |   1F |   24  818   13 | -167 452   8 | -150 367  20 |
## 2  |   1M |  139  997  181 |  386 986 242 |  -41  11   8 |
## 3  |   2F |   16  995   35 | -407 647  31 | -299 349  51 |
## 4  |   2M |  140  984  131 |  326 982 175 |  -15   2   1 |
## 5  |   3F |   53  999  112 | -334 453  69 | -367 547 256 |
## 6  |   3M |  138  904   40 |  175 904  49 |   -4   0   0 |
## 7  |   4F |   32  982   37 | -344 887  44 | -113  95  15 |
## 8  |   4M |  149  383   23 |   81 372  11 |   14  11   1 |
## 9  |   5F |   59  994  153 | -453 686 143 | -304 309 197 |
## 10 |   5M |  248 1000  276 | -281 608 228 |  225 391 451 |
## 
## Columns:
##     name   mass  qlt  inr    k=1 cor ctr    k=2 cor ctr  
## 1 |    A |  255  985  381 |  414 985 509 |   -1   0   0 |
## 2 |    B |  235  720   60 |  135 608  50 |   58 112  28 |
## 3 |    C |  419  999  283 | -247 773 298 | -133 226 267 |
## 4 |    D |   92  995  276 | -366 383 143 |  463 612 705 |
  1. Construct an informative 2D plot of the solution, and interpret in terms of how the response varies with year for males and females.
# Year 5 males are more likely to have response D
# Year 1 and 4 females are more likely to have responce C.
# Year 3 and 4 males are more likely to have response B.
# Year 1 and 2 males are more likely to have response A.

plot(vietnamca)

  1. Use mjca () to carry out an MCA on the three-way table. Make a useful plot of the solution and interpret in terms of the relationship of the response to year and sex.
# Result is similar to the analysis in question b. 
# Year 5 males are more likely to have response D
# Year 1 and 4 females are more likely to have responce C.
# Year 3 and 4 males are more likely to have response B.
# Year 1 and 2 males are more likely to have response A.
vietnammjca <-mjca(vietnam.tab)
plot(vietnammjca)