Exercise 6.2 The data set criminal in the package logmult gives the 4 × 5 table below of the number of men aged 15-19 charged with a criminal case for whom charges were dropped in Denmark from 1955-1958.
library("logmult")
## Warning: package 'logmult' was built under R version 3.4.4
## Loading required package: gnm
## Warning: package 'gnm' was built under R version 3.4.4
##
## Attaching package: 'logmult'
## The following object is masked from 'package:gnm':
##
## se
data("criminal",package = "logmult")
criminal
## Age
## Year 15 16 17 18 19
## 1955 141 285 320 441 427
## 1956 144 292 342 441 396
## 1957 196 380 424 462 427
## 1958 212 424 399 442 430
Carry out a simple correspondence analysis on this table.
library(ca)
## Warning: package 'ca' was built under R version 3.4.4
criminal_ca=ca(criminal)
summary(criminal_ca)
##
## Principal inertias (eigenvalues):
##
## dim value % cum% scree plot
## 1 0.004939 90.3 90.3 ***********************
## 2 0.000491 9.0 99.3 **
## 3 3.8e-050 0.7 100.0
## -------- -----
## Total: 0.005468 100.0
##
##
## Rows:
## name mass qlt inr k=1 cor ctr k=2 cor ctr
## 1 | 1955 | 230 996 347 | 88 939 361 | -22 58 223 |
## 2 | 1956 | 230 978 157 | 58 908 157 | 16 71 124 |
## 3 | 1957 | 269 984 111 | -39 669 82 | 27 315 391 |
## 4 | 1958 | 271 999 385 | -85 938 399 | -22 61 262 |
##
## Columns:
## name mass qlt inr k=1 cor ctr k=2 cor ctr
## 1 | 15 | 99 998 185 | -101 992 203 | -7 5 11 |
## 2 | 16 | 197 996 312 | -91 959 331 | -18 37 128 |
## 3 | 17 | 211 991 75 | -23 281 23 | 37 710 594 |
## 4 | 18 | 254 989 235 | 70 980 255 | 7 9 24 |
## 5 | 19 | 239 990 194 | 62 877 188 | -22 112 243 |
(a)What percentages of the Pearson ??2 for association are explained by the various dimensions?
The summary of the correspondence analysis reveals that there is 90.3% (Dimension 1) correlation between people of ages 15 through 18 whom had their charges dropped. In addition, the summary of the correspondence analysis also reveals a 9.0% (Dimension 2) association between people who are 19 and had the charges dropped.
(b)Plot the 2D correspondence analysis solution. Describe the pattern of association between year and age.
plot(criminal_ca)
The plotted correspondence analysis of the data sets shows there is an association between year and age. Specifically, we can note associations between the year 1958 and age 15 -16, year 1955 and age 19, year 1957 and age 17, and year 1956 and age 18.
Exercise 6.11 The data set Vietnam in vcdExtra gives a 2 × 5 × 4 contingency table in frequency form reflecting a survey of student opinion on the Vietnam War at the University of North Carolina in May 1967. The table variables are sex, year in school, and response, which has categories: (A) Defeat North Vietnam by widespread bombing and land invasion; (B) Maintain the present policy; (C) De-escalate military activity, stop bombing and begin negotiations; (D) Withdraw military forces immediately.
library("vcdExtra")
## Warning: package 'vcdExtra' was built under R version 3.4.4
## Loading required package: vcd
## Warning: package 'vcd' was built under R version 3.4.4
## Loading required package: grid
##
## Attaching package: 'vcd'
## The following object is masked from 'package:logmult':
##
## assoc
data("Vietnam", package="vcdExtra")
str(Vietnam)
## 'data.frame': 40 obs. of 4 variables:
## $ sex : Factor w/ 2 levels "Female","Male": 1 1 1 1 1 1 1 1 1 1 ...
## $ year : int 1 1 1 1 2 2 2 2 3 3 ...
## $ response: Factor w/ 4 levels "A","B","C","D": 1 2 3 4 1 2 3 4 1 2 ...
## $ Freq : int 13 19 40 5 5 9 33 3 22 29 ...
Vietnam=within(Vietnam, {year_sex=paste(year, toupper(substr(sex,1,1)))})
Vietnam_lm=xtabs(Freq ~ year_sex + response, data=Vietnam)
Vietnam_Ca=ca(Vietnam_lm)
summary(Vietnam_Ca)
##
## Principal inertias (eigenvalues):
##
## dim value % cum% scree plot
## 1 0.085680 73.6 73.6 ******************
## 2 0.027881 23.9 97.5 ******
## 3 0.002854 2.5 100.0 *
## -------- -----
## Total: 0.116415 100.0
##
##
## Rows:
## name mass qlt inr k=1 cor ctr k=2 cor ctr
## 1 | 1F | 24 818 13 | -167 452 8 | -150 367 20 |
## 2 | 1M | 139 997 181 | 386 986 242 | -41 11 8 |
## 3 | 2F | 16 995 35 | -407 647 31 | -299 349 51 |
## 4 | 2M | 140 984 131 | 326 982 175 | -15 2 1 |
## 5 | 3F | 53 999 112 | -334 453 69 | -367 547 256 |
## 6 | 3M | 138 904 40 | 175 904 49 | -4 0 0 |
## 7 | 4F | 32 982 37 | -344 887 44 | -113 95 15 |
## 8 | 4M | 149 383 23 | 81 372 11 | 14 11 1 |
## 9 | 5F | 59 994 153 | -453 686 143 | -304 309 197 |
## 10 | 5M | 248 1000 276 | -281 608 228 | 225 391 451 |
##
## Columns:
## name mass qlt inr k=1 cor ctr k=2 cor ctr
## 1 | A | 255 985 381 | 414 985 509 | -1 0 0 |
## 2 | B | 235 720 60 | 135 608 50 | 58 112 28 |
## 3 | C | 419 999 283 | -247 773 298 | -133 226 267 |
## 4 | D | 92 995 276 | -366 383 143 | 463 612 705 |
plot(Vietnam_Ca)
The plotted correspondence analysis of the data sets shows there is an association between year, response and sex. Specifically, we can note associations for response A from males of Year 1 & 2, for response B from males year 3 & 4, for response C females from years 1 & 4, and for response D males of year 5.
Vietnam_MCA=mjca(Vietnam_lm)
plot(Vietnam_MCA)
The plotted MCA analysis of the linear model reflects similar associations shown in the question below. Similarly, we can see associations for response A from males of Year 1 & 2, for response B from males year 3 & 4, for response C females from years 1 & 4, and for response D males of year 5.