pacman::p_load(tidyverse, magrittr, vcd, vcdExtra, MASS, logmult, ca)
The data set criminal in the package logmult gives the 4 × 5 table below of the number of men aged 15–19 charged with a criminal case for whom charges were dropped in Denmark from 1955–1958.
data("criminal", package = "logmult")
criminal
## Age
## Year 15 16 17 18 19
## 1955 141 285 320 441 427
## 1956 144 292 342 441 396
## 1957 196 380 424 462 427
## 1958 212 424 399 442 430
Carry out a simple correspondence analysis on this table.
criminal.ca <- ca(criminal)
criminal.ca
##
## Principal inertias (eigenvalues):
## 1 2 3
## Value 0.004939 0.000491 0.000038
## Percentage 90.33% 8.98% 0.69%
##
##
## Rows:
## 1955 1956 1957 1958
## Mass 0.229751 0.229893 0.268897 0.271459
## ChiDist 0.090897 0.061048 0.047585 0.088033
## Inertia 0.001898 0.000857 0.000609 0.002104
## Dim. 1 1.253085 0.827543 -0.553684 -1.212927
## Dim. 2 -0.984738 0.733468 1.206411 -0.982745
##
##
## Columns:
## 15 16 17 18 19
## Mass 0.098648 0.196584 0.211388 0.254235 0.239146
## ChiDist 0.101134 0.093089 0.044072 0.071068 0.066594
## Inertia 0.001009 0.001703 0.000411 0.001284 0.001061
## Dim. 1 -1.433374 -1.297270 -0.332608 1.000960 0.887539
## Dim. 2 -0.333181 -0.808352 1.676250 0.307874 -1.007063
The first dimension explained 90.33% of the association, and second dimension explained 8.98%. The diagram shows the scoring of row and columns.
plot(criminal.ca)
summary(criminal.ca)
##
## Principal inertias (eigenvalues):
##
## dim value % cum% scree plot
## 1 0.004939 90.3 90.3 ***********************
## 2 0.000491 9.0 99.3 **
## 3 0.000038 0.7 100.0
## -------- -----
## Total: 0.005468 100.0
##
##
## Rows:
## name mass qlt inr k=1 cor ctr k=2 cor ctr
## 1 | 1955 | 230 996 347 | 88 939 361 | -22 58 223 |
## 2 | 1956 | 230 978 157 | 58 908 157 | 16 71 124 |
## 3 | 1957 | 269 984 111 | -39 669 82 | 27 315 391 |
## 4 | 1958 | 271 999 385 | -85 938 399 | -22 61 262 |
##
## Columns:
## name mass qlt inr k=1 cor ctr k=2 cor ctr
## 1 | 15 | 99 998 185 | -101 992 203 | -7 5 11 |
## 2 | 16 | 197 996 312 | -91 959 331 | -18 37 128 |
## 3 | 17 | 211 991 75 | -23 281 23 | 37 710 594 |
## 4 | 18 | 254 989 235 | 70 980 255 | 7 9 24 |
## 5 | 19 | 239 990 194 | 62 877 188 | -22 112 243 |
mosaic(criminal, shade=TRUE, labeling=labeling_residuals)
From the plot we can see that Age 15 and Age 18 are closer to the center (horizontal), meaning more independent in the table. When two variables are closer to each other ,it indicates that those variables have positive association. And the mosaic graph shows that Age 16 and Year 1958 has the highest association(negative) and Age 19 and Year 1955 has the second highest positive association. Age 19 and Year 1955 has the second highest positive association, their observed frequency are higher than expected frequency.
The data set Vietnam in vcdExtra gives a 2 × 5 × 4 contingency table in frequency form reflecting a survey of student opinion on the Vietnam War at the University of North Carolina in May 1967. The table variables are sex, year in school, and response, which has categories:
data("Vietnam", package = "vcdExtra")
str(Vietnam)
## 'data.frame': 40 obs. of 4 variables:
## $ sex : Factor w/ 2 levels "Female","Male": 1 1 1 1 1 1 1 1 1 1 ...
## $ year : int 1 1 1 1 2 2 2 2 3 3 ...
## $ response: Factor w/ 4 levels "A","B","C","D": 1 2 3 4 1 2 3 4 1 2 ...
## $ Freq : int 13 19 40 5 5 9 33 3 22 29 ...
Vietnam <- within(Vietnam, {year.sex <- paste(year, toupper(substr(sex,1,1)))})
Vietnam.t <- xtabs(Freq~ year.sex +response, data=Vietnam)
Vietnam.t
## response
## year.sex A B C D
## 1 F 13 19 40 5
## 1 M 175 116 131 17
## 2 F 5 9 33 3
## 2 M 160 126 135 21
## 3 F 22 29 110 6
## 3 M 132 120 154 29
## 4 F 12 21 58 10
## 4 M 145 95 185 44
## 5 F 19 27 128 13
## 5 M 118 176 345 141
Vietnam.ca <- ca(Vietnam.t)
summary(Vietnam.ca)
##
## Principal inertias (eigenvalues):
##
## dim value % cum% scree plot
## 1 0.085680 73.6 73.6 ******************
## 2 0.027881 23.9 97.5 ******
## 3 0.002854 2.5 100.0 *
## -------- -----
## Total: 0.116415 100.0
##
##
## Rows:
## name mass qlt inr k=1 cor ctr k=2 cor ctr
## 1 | 1F | 24 818 13 | -167 452 8 | -150 367 20 |
## 2 | 1M | 139 997 181 | 386 986 242 | -41 11 8 |
## 3 | 2F | 16 995 35 | -407 647 31 | -299 349 51 |
## 4 | 2M | 140 984 131 | 326 982 175 | -15 2 1 |
## 5 | 3F | 53 999 112 | -334 453 69 | -367 547 256 |
## 6 | 3M | 138 904 40 | 175 904 49 | -4 0 0 |
## [ reached getOption("max.print") -- omitted 4 rows ]
##
## Columns:
## name mass qlt inr k=1 cor ctr k=2 cor ctr
## 1 | A | 255 985 381 | 414 985 509 | -1 0 0 |
## 2 | B | 235 720 60 | 135 608 50 | 58 112 28 |
## 3 | C | 419 999 283 | -247 773 298 | -133 226 267 |
## 4 | D | 92 995 276 | -366 383 143 | 463 612 705 |
plot(Vietnam.ca)
The first dimension account 73.6% of the second dimension account for 23.9% , they account 97.5% in total. And the graph shows that:
Vietnam.mca <- mjca(Vietnam.t)
summary(Vietnam.mca)
##
## Principal inertias (eigenvalues):
##
## dim value % cum% scree plot
## 1 0.085680 73.6 73.6 ******************
## 2 0.027881 23.9 97.5 ******
## 3 0.002854 2.5 100.0 *
## 4 00000000 0.0 100.0
## -------- -----
## Total: 0.116415
##
##
## Columns:
## name mass qlt inr k=1 cor ctr k=2 cor ctr
## 1 | year.sex:1 F | 12 818 80 | 167 452 4 | -150 367 10 |
## 2 | year.sex:1 M | 70 997 72 | -386 986 121 | -41 11 4 |
## 3 | year.sex:2 F | 8 995 81 | 407 647 15 | -299 349 25 |
## 4 | year.sex:2 M | 70 984 72 | -326 982 87 | -15 2 1 |
## 5 | year.sex:3 F | 27 999 78 | 334 453 34 | -367 547 128 |
## 6 | year.sex:3 M | 69 904 71 | -175 904 25 | -4 0 0 |
## [ reached getOption("max.print") -- omitted 8 rows ]
plot(Vietnam.mca)
It still shows that females are more likely to choose C, regardless of the year, and males have different choices depend on the year in school.