Exercise 6.1 The JobSat data in vcdExtra gives a 4 × 4 table recording job satisfaction in relation to income.

library(vcdExtra)
## Warning: package 'vcdExtra' was built under R version 3.4.4
## Loading required package: vcd
## Warning: package 'vcd' was built under R version 3.4.4
## Loading required package: grid
## Loading required package: gnm
## Warning: package 'gnm' was built under R version 3.4.4
data(JobSat)
JobSat
##         satisfaction
## income   VeryD LittleD ModerateS VeryS
##   < 15k      1       3        10     6
##   15-25k     2       3        10     7
##   25-40k     1       6        14    12
##   > 40k      0       1         9    11

(a) Carry out a simple correspondence analysis on this table. How much of the inertia is accounted for by a one-dimensional solution? How much by a two-dimensional solution?

data("JobSat", package="vcdExtra")
JobSat
##         satisfaction
## income   VeryD LittleD ModerateS VeryS
##   < 15k      1       3        10     6
##   15-25k     2       3        10     7
##   25-40k     1       6        14    12
##   > 40k      0       1         9    11
library(ca)
## Warning: package 'ca' was built under R version 3.4.4
JobSat.ca <- ca(JobSat)
summary(JobSat.ca)
## 
## Principal inertias (eigenvalues):
## 
##  dim    value      %   cum%   scree plot               
##  1      0.047496  76.4  76.4  *******************      
##  2      0.012248  19.7  96.1  *****                    
##  3      0.002397   3.9 100.0  *                        
##         -------- -----                                 
##  Total: 0.062141 100.0                                 
## 
## 
## Rows:
##     name   mass  qlt  inr    k=1 cor ctr    k=2 cor ctr  
## 1 |  15k |  208  658   81 | -126 658  70 |    4   1   0 |
## 2 | 1525 |  229  977  247 | -208 647 209 | -148 329 412 |
## 3 | 2540 |  344  956  114 |  -35  59   9 |  136 897 519 |
## 4 |  40k |  219 1000  558 |  393 976 712 |  -62  24  69 |
## 
## Columns:
##     name   mass  qlt  inr    k=1 cor ctr    k=2 cor ctr  
## 1 | VryD |   42  985  384 | -662 766 385 | -354 219 427 |
## 2 | LttD |  135  990  294 | -289 621 239 |  223 369 549 |
## 3 | MdrS |  448  334   31 |  -31 233   9 |  -21 101  16 |
## 4 | VryS |  375  967  292 |  216 961 367 |  -16   6   8 |

It can be seen that 96.1% of the ??2 for this model is accounted for in the first two dimensions. The scree plot display above shows that the association between job satisfaction and income is one-dimensional,the dimension in the analysis has inertia (0.0474)

(b) Plot the 2D CA solution. Explain the first dimension of the plot? Are the points ordered? What does it show? Do you see any association? If yes, where? If no, what is the reason?

plot(JobSat.ca)

Dimension 1 in the plot separates income groups, indicating a co-relation between Job satisfaction with respective to income groups. The first dimension is mostly ordered by job satisfaction with little dissatisfaction group at the top and very dissatisfied groups at the bottom. we can see 25-40 k income groups are less dissatisfied and 15-25k are very much dissatisfied with their job.

Exercise 6.11 Refer to Exercise 5.9 for a description of the Accident data set in vcdExtra. The data set is in the form of a frequency data frame, so first convert to table form.

data("Accident", package="vcdExtra")
Accident
##      age  result       mode gender  Freq
## 1    50+    Died Pedestrian   Male   704
## 2    50+    Died Pedestrian Female   378
## 3    50+    Died    Bicycle   Male   396
## 4    50+    Died    Bicycle Female    56
## 5    50+    Died Motorcycle   Male   742
## 6    50+    Died Motorcycle Female    78
## 7    50+    Died  4-Wheeled   Male   513
## 8    50+    Died  4-Wheeled Female   253
## 9    50+ Injured Pedestrian   Male  5206
## 10   50+ Injured Pedestrian Female  5449
## 11   50+ Injured    Bicycle   Male  3863
## 12   50+ Injured    Bicycle Female  1030
## 13   50+ Injured Motorcycle   Male  8597
## 14   50+ Injured Motorcycle Female  1387
## 15   50+ Injured  4-Wheeled   Male  7423
## 16   50+ Injured  4-Wheeled Female  5552
## 17 30-49    Died Pedestrian   Male   223
## 18 30-49    Died Pedestrian Female    49
## 19 30-49    Died    Bicycle   Male   146
## 20 30-49    Died    Bicycle Female    24
## 21 30-49    Died Motorcycle   Male   889
## 22 30-49    Died Motorcycle Female    98
## 23 30-49    Died  4-Wheeled   Male   720
## 24 30-49    Died  4-Wheeled Female   199
## 25 30-49 Injured Pedestrian   Male  3178
## 26 30-49 Injured Pedestrian Female  1814
## 27 30-49 Injured    Bicycle   Male  3024
## 28 30-49 Injured    Bicycle Female  1118
## 29 30-49 Injured Motorcycle   Male 18909
## 30 30-49 Injured Motorcycle Female  3664
## 31 30-49 Injured  4-Wheeled   Male 15086
## 32 30-49 Injured  4-Wheeled Female  7712
## 33 20-29    Died Pedestrian   Male    78
## 34 20-29    Died Pedestrian Female    24
## 35 20-29    Died    Bicycle   Male    55
## 36 20-29    Died    Bicycle Female    10
## 37 20-29    Died Motorcycle   Male   660
## 38 20-29    Died Motorcycle Female    82
## 39 20-29    Died  4-Wheeled   Male   353
## 40 20-29    Died  4-Wheeled Female   107
## 41 20-29 Injured Pedestrian   Male  1521
## 42 20-29 Injured Pedestrian Female   864
## 43 20-29 Injured    Bicycle   Male  1565
## 44 20-29 Injured    Bicycle Female   609
## 45 20-29 Injured Motorcycle   Male 18558
## 46 20-29 Injured Motorcycle Female  4010
## 47 20-29 Injured  4-Wheeled   Male  9084
## 48 20-29 Injured  4-Wheeled Female  4361
## 49 10-19    Died Pedestrian   Male    70
## 50 10-19    Died Pedestrian Female    28
## 51 10-19    Died    Bicycle   Male    76
## 52 10-19    Died    Bicycle Female    31
## 53 10-19    Died Motorcycle   Male   362
## 54 10-19    Died Motorcycle Female    54
## 55 10-19    Died  4-Wheeled   Male   150
## 56 10-19    Died  4-Wheeled Female    61
## 57 10-19 Injured Pedestrian   Male  1827
## 58 10-19 Injured Pedestrian Female  1495
## 59 10-19 Injured    Bicycle   Male  3407
## 60 10-19 Injured    Bicycle Female  7218
## 61 10-19 Injured Motorcycle   Male 12311
## 62 10-19 Injured Motorcycle Female  3587
## 63 10-19 Injured  4-Wheeled   Male  3543
## 64 10-19 Injured  4-Wheeled Female  2593
## 65   0-9    Died Pedestrian   Male   150
## 66   0-9    Died Pedestrian Female    89
## 67   0-9    Died    Bicycle   Male    26
## 68   0-9    Died    Bicycle Female     5
## 69   0-9    Died Motorcycle   Male     6
## 70   0-9    Died Motorcycle Female     6
## 71   0-9    Died  4-Wheeled   Male    70
## 72   0-9    Died  4-Wheeled Female    65
## 73   0-9 Injured Pedestrian   Male  3341
## 74   0-9 Injured Pedestrian Female  1967
## 75   0-9 Injured    Bicycle   Male   378
## 76   0-9 Injured    Bicycle Female   126
## 77   0-9 Injured Motorcycle   Male   181
## 78   0-9 Injured Motorcycle Female   131
## 79   0-9 Injured  4-Wheeled   Male  1593
## 80   0-9 Injured  4-Wheeled Female  1362
accident_tab <- xtabs(Freq ~ gender+mode+age+result, data=Accident)
accident_tab
## , , age = 0-9, result = Died
## 
##         mode
## gender   4-Wheeled Bicycle Motorcycle Pedestrian
##   Female        65       5          6         89
##   Male          70      26          6        150
## 
## , , age = 10-19, result = Died
## 
##         mode
## gender   4-Wheeled Bicycle Motorcycle Pedestrian
##   Female        61      31         54         28
##   Male         150      76        362         70
## 
## , , age = 20-29, result = Died
## 
##         mode
## gender   4-Wheeled Bicycle Motorcycle Pedestrian
##   Female       107      10         82         24
##   Male         353      55        660         78
## 
## , , age = 30-49, result = Died
## 
##         mode
## gender   4-Wheeled Bicycle Motorcycle Pedestrian
##   Female       199      24         98         49
##   Male         720     146        889        223
## 
## , , age = 50+, result = Died
## 
##         mode
## gender   4-Wheeled Bicycle Motorcycle Pedestrian
##   Female       253      56         78        378
##   Male         513     396        742        704
## 
## , , age = 0-9, result = Injured
## 
##         mode
## gender   4-Wheeled Bicycle Motorcycle Pedestrian
##   Female      1362     126        131       1967
##   Male        1593     378        181       3341
## 
## , , age = 10-19, result = Injured
## 
##         mode
## gender   4-Wheeled Bicycle Motorcycle Pedestrian
##   Female      2593    7218       3587       1495
##   Male        3543    3407      12311       1827
## 
## , , age = 20-29, result = Injured
## 
##         mode
## gender   4-Wheeled Bicycle Motorcycle Pedestrian
##   Female      4361     609       4010        864
##   Male        9084    1565      18558       1521
## 
## , , age = 30-49, result = Injured
## 
##         mode
## gender   4-Wheeled Bicycle Motorcycle Pedestrian
##   Female      7712    1118       3664       1814
##   Male       15086    3024      18909       3178
## 
## , , age = 50+, result = Injured
## 
##         mode
## gender   4-Wheeled Bicycle Motorcycle Pedestrian
##   Female      5552    1030       1387       5449
##   Male        7423    3863       8597       5206
(a) Use mjca() to carry out an MCA on the four-way table accident.tab. How do you explain the summary results?
accident.mca <- mjca(accident_tab)
summary(accident.mca)
## 
## Principal inertias (eigenvalues):
## 
##  dim    value      %   cum%   scree plot               
##  1      0.025429  46.5  46.5  ****************         
##  2      0.011848  21.7  68.1  *******                  
##  3      0.001889   3.5  71.6  *                        
##  4      0.000491   0.9  72.5                           
##         -------- -----                                 
##  Total: 0.054700                                       
## 
## 
## Columns:
##                 name   mass  qlt  inr    k=1 cor ctr    k=2 cor ctr  
## 1  |   gender:Female |   77  788   77 | -203 686 126 |   78 101  40 |
## 2  |     gender:Male |  173  788   35 |   91 686  56 |  -35 101  18 |
## 3  |  mode:4-Wheeled |   81  230   73 |   -8   2   0 |  -80 228  43 |
## 4  |    mode:Bicycle |   31  762   98 | -156 127  30 |  349 635 320 |
## 5  | mode:Motorcycle |   99  686   70 |  209 684 170 |   11   2   1 |
## 6  | mode:Pedestrian |   38  677  100 | -401 600 241 | -144  77  66 |
## 7  |         age:0-9 |   13  672  107 | -551 561 152 | -246 111  65 |
## 8  |       age:10-19 |   49  678   91 |  -40  13   3 |  292 665 354 |
## 9  |       age:20-29 |   56  784   85 |  215 747 102 |  -48  37  11 |
## 10 |       age:30-49 |   76  546   75 |  103 396  32 |  -63 149  26 |
## 11 |         age:50+ |   56  687   85 | -196 616  84 |  -67  72  21 |
## 12 |     result:Died |   11  515  100 |  -90  92   3 | -192 422  34 |
## 13 |  result:Injured |  239  515    5 |    4  92   0 |    9 422   2 |

It can be seen that 68.1% of the ??2 for this model is accounted for in the first two dimensions. The scree plot display above shows that the association between gender and mode is two-dimensional,the fisrt dimension in the analysis has inertia (0.025429)

(b) Construct an informative 2D plot of the solution, and interpret in terms of how the variable result varies in relation to the other factors. Please explain everything in detail (all the relationships that you see)

plot(accident.mca)

Dimension 1 in the plot separates males (right) and females (left), indicating a co-relation between gender with respective to mode of the transportation corrsponding to age groups and results of their accidents. The first dimension is ordered by mode of transportation and age groups with age group 10-19 and their mode of travel is bicycle at the top and age groups around 0-9 at the bottom. we can see males with age groups 30-49 with 4 wheeled are the people reported with more deaths