Question 1

Exercise 6.2 The data set criminal in the package logmult gives the 4 × 5 table below of the number of men aged 15-19 charged with a criminal case for whom charges were dropped in Denmark from 1955-1958. Carry out a simple correspondence analysis on this table.

(a)What percentages of the Pearson X2 for association are explained by the various dimensions?

(b)Plot the 2D correspondence analysis solution. Describe the pattern of association between year and age.

library(ca)
data("criminal", package="logmult")
str(criminal)
##  table [1:4, 1:5] 141 144 196 212 285 292 380 424 320 342 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ Year: chr [1:4] "1955" "1956" "1957" "1958"
##   ..$ Age : chr [1:5] "15" "16" "17" "18" ...
# a: Our results indicate that the first dimension explained 90.3% of association while the second dimension explained 9% of association. Hence 99.3% of Pearson Chi-Square was explained by various dimension.
criminal.ca <- ca(criminal)
summary(criminal.ca)
## 
## Principal inertias (eigenvalues):
## 
##  dim    value      %   cum%   scree plot               
##  1      0.004939  90.3  90.3  ***********************  
##  2      0.000491   9.0  99.3  **                       
##  3      3.8e-050   0.7 100.0                           
##         -------- -----                                 
##  Total: 0.005468 100.0                                 
## 
## 
## Rows:
##     name   mass  qlt  inr    k=1 cor ctr    k=2 cor ctr  
## 1 | 1955 |  230  996  347 |   88 939 361 |  -22  58 223 |
## 2 | 1956 |  230  978  157 |   58 908 157 |   16  71 124 |
## 3 | 1957 |  269  984  111 |  -39 669  82 |   27 315 391 |
## 4 | 1958 |  271  999  385 |  -85 938 399 |  -22  61 262 |
## 
## Columns:
##     name   mass  qlt  inr    k=1 cor ctr    k=2 cor ctr  
## 1 |   15 |   99  998  185 | -101 992 203 |   -7   5  11 |
## 2 |   16 |  197  996  312 |  -91 959 331 |  -18  37 128 |
## 3 |   17 |  211  991   75 |  -23 281  23 |   37 710 594 |
## 4 |   18 |  254  989  235 |   70 980 255 |    7   9  24 |
## 5 |   19 |  239  990  194 |   62 877 188 |  -22 112 243 |
#b : Our results indicate an association between year and age. 
plot(criminal.ca)

Question 2

The data set Vietnam in vcdExtra gives a 2 × 5 × 4 contingency table in frequency form reflecting a survey of student opinion on the Vietnam War at the University of North Carolina in May 1967. The table variables are sex, year in school, and response, which has categories: (A) Defeat North Vietnam by widespread bombing and land invasion; (B) Maintain the present policy; (C) De-escalate military activity, stop bombing and begin negotiations; (D) Withdraw military forces immediately.

  1. Using the stacking approach, carry out a correspondence analysis corresponding to the loglinear model [R][YS], which asserts that the response is independent of the combinations of year an sex.

  2. Construct an informative 2D plot of the solution, and interpret in terms of how the response varies with year for males and females.

  3. Use mjca () to carry out an MCA on the three-way table. Make a useful plot of the solution and interpret in terms of the relationship of the response to year and sex.

data("Vietnam", package="vcdExtra")
str(Vietnam)
## 'data.frame':    40 obs. of  4 variables:
##  $ sex     : Factor w/ 2 levels "Female","Male": 1 1 1 1 1 1 1 1 1 1 ...
##  $ year    : int  1 1 1 1 2 2 2 2 3 3 ...
##  $ response: Factor w/ 4 levels "A","B","C","D": 1 2 3 4 1 2 3 4 1 2 ...
##  $ Freq    : int  13 19 40 5 5 9 33 3 22 29 ...
#a.
Vietnam <- within(Vietnam, {year_sex <-paste(year, toupper(substr(sex,1,1)))})
Vietnam_year_sex <-xtabs(Freq~year_sex +response, data = Vietnam)
Vietnam.ca <-ca(Vietnam_year_sex)
summary(Vietnam_year_sex)
## Call: xtabs(formula = Freq ~ year_sex + response, data = Vietnam)
## Number of cases in table: 3147 
## Number of factors: 2 
## Test for independence of all factors:
##  Chisq = 366.4, df = 27, p-value = 3.387e-61
##  Chi-squared approximation may be incorrect
#b.
plot(Vietnam.ca)

# Results from our Correspondence analysis indicate that there is association of following combination of respone and year/sex:
#*For female of year 1 and 4, C is more likely to be the resonse;
#*For male of year 3 and 4, B is more likely to be the response;
#*For male of year 1 and 2, A is more likely to be the response;
#*For male of year 5, D is more likely to be the response


#c.
Vietnam.mjca <-mjca(Vietnam_year_sex)
plot(Vietnam.mjca)

#Our results are similar to results in part b for correspndence analysis.
#*There is association for male of year 1 and 2 and response A
#*There is association for male of year 3 and 4 and response B
#*There is association for male of year 5 and response D
#*There is association for female of year 1 and 4 and response C

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.