Exercise 8.4 Refer to Homework #5, second question for a description of the Vietnam data set in vcdExtra. The goal here is to fit models for the polytomous response variable in relation to year and sex.

  1. Fit the proportional odds model to these data, allowing an interaction of year and sex.
data("Vietnam", package = "vcdExtra")
head(Vietnam)
##      sex year response Freq
## 1 Female    1        A   13
## 2 Female    1        B   19
## 3 Female    1        C   40
## 4 Female    1        D    5
## 5 Female    2        A    5
## 6 Female    2        B    9
library(car)
## Warning: package 'car' was built under R version 3.4.4
## Loading required package: carData
## Warning: package 'carData' was built under R version 3.4.4
library(vcdExtra)
## Warning: package 'vcdExtra' was built under R version 3.4.4
## Loading required package: vcd
## Warning: package 'vcd' was built under R version 3.4.4
## Loading required package: grid
## Loading required package: gnm
## Warning: package 'gnm' was built under R version 3.4.4
## 
## Attaching package: 'vcdExtra'
## The following object is masked from 'package:carData':
## 
##     Burt
library(MASS)
## Warning: package 'MASS' was built under R version 3.4.4
library(effects)
## Warning: package 'effects' was built under R version 3.4.4
## lattice theme set by effectsTheme()
## See ?effectsTheme for details.
library(nnet)
## Warning: package 'nnet' was built under R version 3.4.4
library(stats)
Vtn_tab = xtabs(Freq ~ ., data = Vietnam)
cotabplot( ~ year + response | sex, data = Vtn_tab, gp = shading_Marimekko(Vtn_tab[1,,]))

From these plots, we can conclude that there is linear relationship between the year and response. This relationship is very strong for men in the C&D category unlike for female students. For female population, there is no strong correlation from the graphs.

library(MASS)
Vietnam$response = ordered(Vietnam$response)
Vtn_polr = polr(response ~ year * sex, data = Vietnam, weights = Freq, Hess = TRUE)
summary(Vtn_polr)
## Call:
## polr(formula = response ~ year * sex, data = Vietnam, weights = Freq, 
##     Hess = TRUE)
## 
## Coefficients:
##                Value Std. Error t value
## year          0.1008    0.05481   1.840
## sexMale      -1.4367    0.22301  -6.442
## year:sexMale  0.2333    0.06025   3.873
## 
## Intercepts:
##     Value   Std. Error t value
## A|B -1.3533  0.2059    -6.5737
## B|C -0.2538  0.2041    -1.2437
## C|D  2.1879  0.2104    10.3968
## 
## Residual Deviance: 7757.059 
## AIC: 7769.059

t-value is positive for C|D indicates a strong correlation in the model. We cant tell the model is significant or not without any p-values. so we use Anova() function

Anova(Vtn_polr)
## Analysis of Deviance Table (Type II tests)
## 
## Response: response
##          LR Chisq Df Pr(>Chisq)    
## year      166.510  1  < 2.2e-16 ***
## sex        58.473  1  2.061e-14 ***
## year:sex   15.041  1  0.0001052 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the type II tests, we can conclude that all the co-efficients are not significant in the interaction of year with sex.

  1. Is there evidence that the proportional odds assumption does not hold for this data set? Use the methods described in “Ordinal response: Proportional odds model” to assess this.
library(VGAM)
## Warning: package 'VGAM' was built under R version 3.4.4
## Loading required package: stats4
## Loading required package: splines
## 
## Attaching package: 'VGAM'
## The following object is masked from 'package:car':
## 
##     logit
Vtn_PO = vglm(response ~ year * sex, data = Vietnam, weights = Freq, family = cumulative(parallel = TRUE))
Vtn_NPO = vglm(response ~ year * sex, data = Vietnam, weights = Freq, family = cumulative(parallel = FALSE))
lrtest(Vtn_NPO, Vtn_PO)
## Likelihood ratio test
## 
## Model 1: response ~ year * sex
## Model 2: response ~ year * sex
##   #Df  LogLik Df Chisq Pr(>Chisq)    
## 1 108 -3839.7                        
## 2 114 -3878.5  6 77.57  1.134e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In the likelihood test, I am comparing the two models: proportional and non-proportional data. We can conclude that the unconstrained model fits better than the proportional odd model from the Chisq values.

  1. Fit the multinomial logistic model, also allowing an interaction. Use car::Anova () to assess the model terms.
library(nnet)
Vtn_mtn = multinom(response ~ year * sex, data = Vietnam, weights = Freq)
## # weights:  20 (12 variable)
## initial  value 4362.668354 
## iter  10 value 3927.047674
## final  value 3838.854903 
## converged
summary(Vtn_mtn)
## Call:
## multinom(formula = response ~ year * sex, data = Vietnam, weights = Freq)
## 
## Coefficients:
##   (Intercept)          year    sexMale year:sexMale
## B   0.3928759 -0.0005171856 -0.9719739    0.1469216
## C   1.1452464  0.1475132943 -1.8898266    0.1708179
## D  -1.3100155  0.1908925232 -2.0083782    0.4550651
## 
## Std. Errors:
##   (Intercept)       year   sexMale year:sexMale
## B   0.3914932 0.11020875 0.4105455   0.11627237
## C   0.3368612 0.09373079 0.3572585   0.09985147
## D   0.5677546 0.15097871 0.6163987   0.16217684
## 
## Residual Deviance: 7677.71 
## AIC: 7701.71

Now applying Anova() to find the best model fit

Anova(Vtn_mtn)
## Analysis of Deviance Table (Type II tests)
## 
## Response: response
##          LR Chisq Df Pr(>Chisq)    
## year      175.884  3    < 2e-16 ***
## sex       137.627  3    < 2e-16 ***
## year:sex    7.751  3    0.05145 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the above results, we can conclude that there is significant coefficient of 0.05 between the interaction of year with sex.

  1. Produce an effect plot for this model and describe the nature of the interaction.
library(carData)
library(effects)
plot(allEffects(Vtn_polr), style = "stacked")

From these plots we can conclude that female students are less prone to violent options compared to the men students. The option A has high preference among men compared to female population

  1. Fit the simpler multinomial model in which there is no effect of year for females and the effect of year is linear for males (on the logit scale). Test whether this model is significantly worse than the general multinomial model with interaction.
Vietnam = within(Vietnam, yearMale <- year * (sex == "Male"))
Vtn_mtn2 = multinom(response ~ sex + yearMale, data = Vietnam, weights = Freq)
## # weights:  16 (9 variable)
## initial  value 4362.668354 
## iter  10 value 3894.170452
## final  value 3841.568268 
## converged
Anova(Vtn_mtn2)
## Analysis of Deviance Table (Type II tests)
## 
## Response: response
##          LR Chisq Df Pr(>Chisq)    
## sex        242.80  3  < 2.2e-16 ***
## yearMale   178.21  3  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(Vtn_mtn, Vtn_mtn2)
## Likelihood ratio tests of Multinomial Models
## 
## Response: response
##            Model Resid. df Resid. Dev   Test    Df LR stat.   Pr(Chi)
## 1 sex + yearMale       111   7683.137                                
## 2     year * sex       108   7677.710 1 vs 2     3  5.42673 0.1430872

Based on the anova () results, I would say this model is not significant worse than the general interaction model due to chi value of 0.14