Excercises: 6.2 & 6.11 “Discrete Data Analysis” by: Michael Friendly

Excercise 6.2

The data set criminal in the package logmult gives a 4x5 table of the number of men aged 15-19 charged with a criminal case for whom charges were dropped in Denmark from 1955-1958.

data("criminal", package="logmult")
criminal
##       Age
## Year    15  16  17  18  19
##   1955 141 285 320 441 427
##   1956 144 292 342 441 396
##   1957 196 380 424 462 427
##   1958 212 424 399 442 430

a) What percentages of the pearson x^2 for association are explained by the various dimensions?

Association between age of criminals and year of charges droppoed is almost entirely explained by the 1st dimension.

library(ca)
## Warning: package 'ca' was built under R version 3.4.4
criminal.ca = ca(criminal)
summary(criminal.ca)
## 
## Principal inertias (eigenvalues):
## 
##  dim    value      %   cum%   scree plot               
##  1      0.004939  90.3  90.3  ***********************  
##  2      0.000491   9.0  99.3  **                       
##  3      3.8e-050   0.7 100.0                           
##         -------- -----                                 
##  Total: 0.005468 100.0                                 
## 
## 
## Rows:
##     name   mass  qlt  inr    k=1 cor ctr    k=2 cor ctr  
## 1 | 1955 |  230  996  347 |   88 939 361 |  -22  58 223 |
## 2 | 1956 |  230  978  157 |   58 908 157 |   16  71 124 |
## 3 | 1957 |  269  984  111 |  -39 669  82 |   27 315 391 |
## 4 | 1958 |  271  999  385 |  -85 938 399 |  -22  61 262 |
## 
## Columns:
##     name   mass  qlt  inr    k=1 cor ctr    k=2 cor ctr  
## 1 |   15 |   99  998  185 | -101 992 203 |   -7   5  11 |
## 2 |   16 |  197  996  312 |  -91 959 331 |  -18  37 128 |
## 3 |   17 |  211  991   75 |  -23 281  23 |   37 710 594 |
## 4 |   18 |  254  989  235 |   70 980 255 |    7   9  24 |
## 5 |   19 |  239  990  194 |   62 877 188 |  -22 112 243 |

b) Plot the 2D correspondence analysis solution. Describe the pattern of association between year and age.

Age and years are both aligned with dimension 1 and approximately equally spaced. Because both variables have the same pattern in terms of space and alignment, thus one can assume there is a negative association between age of criminals and years of the number of cases dropped.

ca.plot = plot(criminal.ca, ylim=c(-.04,.04))
lines(ca.plot$rows, col="blue", lty=3)
lines(ca.plot$cols, col="red", lty=3)


Excercise 6.11

The data set Vietnam in vcdExtra gives a 2X5X4 contingency table in frequency form reflecting a survey of student opinion on the Vietnam War at University of North Carolina in May 1967. The table variables are sex, year in school, and response, which has categories: a)Defeat North Vietnam by widespread bombing and invasion B)Maintain the present policy c)De-escalate military activity and begin negotiations d)Withdraw military forces inmediately.

## 'data.frame':    40 obs. of  4 variables:
##  $ sex     : Factor w/ 2 levels "Female","Male": 1 1 1 1 1 1 1 1 1 1 ...
##  $ year    : int  1 1 1 1 2 2 2 2 3 3 ...
##  $ response: Factor w/ 4 levels "A","B","C","D": 1 2 3 4 1 2 3 4 1 2 ...
##  $ Freq    : int  13 19 40 5 5 9 33 3 22 29 ...

A) Using the stacking approach, carry out a correspondence analysis corresponding to the loglinear model [R][YS], which asserts that the response is independent of the combinations of year and sex.

Association between the joint variable (sex.year) and response is almost entirely 2-dimmensional as 97.5% is explained by the first two dimensions.

Vietnam=within(data = Vietnam,expr = (year.sex = interaction(year, sex)))
Vietnam.tab = xtabs(Freq~year.sex+response, data = Vietnam)
Vietnam.ca = ca(Vietnam.tab)
summary(Vietnam.ca)
## 
## Principal inertias (eigenvalues):
## 
##  dim    value      %   cum%   scree plot               
##  1      0.085680  73.6  73.6  ******************       
##  2      0.027881  23.9  97.5  ******                   
##  3      0.002854   2.5 100.0  *                        
##         -------- -----                                 
##  Total: 0.116415 100.0                                 
## 
## 
## Rows:
##      name   mass  qlt  inr    k=1 cor ctr    k=2 cor ctr  
## 1  | 1Fml |   24  818   13 | -167 452   8 | -150 367  20 |
## 2  | 2Fml |   16  995   35 | -407 647  31 | -299 349  51 |
## 3  | 3Fml |   53  999  112 | -334 453  69 | -367 547 256 |
## 4  | 4Fml |   32  982   37 | -344 887  44 | -113  95  15 |
## 5  | 5Fml |   59  994  153 | -453 686 143 | -304 309 197 |
## 6  | 1Mal |  139  997  181 |  386 986 242 |  -41  11   8 |
## 7  | 2Mal |  140  984  131 |  326 982 175 |  -15   2   1 |
## 8  | 3Mal |  138  904   40 |  175 904  49 |   -4   0   0 |
## 9  | 4Mal |  149  383   23 |   81 372  11 |   14  11   1 |
## 10 | 5Mal |  248 1000  276 | -281 608 228 |  225 391 451 |
## 
## Columns:
##     name   mass  qlt  inr    k=1 cor ctr    k=2 cor ctr  
## 1 |    A |  255  985  381 |  414 985 509 |   -1   0   0 |
## 2 |    B |  235  720   60 |  135 608  50 |   58 112  28 |
## 3 |    C |  419  999  283 | -247 773 298 | -133 226 267 |
## 4 |    D |   92  995  276 | -366 383 143 |  463 612 705 |

B) Construct an informative 2D plot of the solution, and interpret in terms of how the response varies with year for males and females.

Dimension 2 separates females(bottom) and males (top) indicating a significant difference in response categories.Also, note that the position of sex and year are not parallel, thus indicating that these two variables do not interact well in this analysis.

Most of the male students allocated their responses to A and B categories while females chose C category.

vietnam.plot = plot(Vietnam.ca)
lines(vietnam.plot$rows, col="Blue", lty= 3)
lines(vietnam.plot$cols, col="red", lty=3)

C) Use mjca () to carry out an MCA on the three-way table. Make a useful plot of the solution and interpret in terms of the relationship of the response to year and sex.

The output is very similar to the initial analysis. The following associations stood out:

*Females of year 1 and 4 and response C

*Males of year 1 and 2 are strongly associated with response category A.

*Males of year 3 and 4 and response B

*Males of year 5 and response D

vietnam.mjca = mjca(Vietnam.tab)
plot(vietnam.mjca)

Conclusion:

Based on these results we can conclude that most females feel the war should be de-escalated and bombings should stop while males think the opposite.