Exercise 8.4 Refer to Homework #5, second question for a description of the Vietnam data set in vcdExtra. The goal here is to fit models for the polytomous response variable in relation to year and sex.
data("Vietnam", package = "vcdExtra")
head(Vietnam)
## sex year response Freq
## 1 Female 1 A 13
## 2 Female 1 B 19
## 3 Female 1 C 40
## 4 Female 1 D 5
## 5 Female 2 A 5
## 6 Female 2 B 9
library(car)
## Warning: package 'car' was built under R version 3.4.4
## Loading required package: carData
## Warning: package 'carData' was built under R version 3.4.4
library(vcdExtra)
## Warning: package 'vcdExtra' was built under R version 3.4.4
## Loading required package: vcd
## Warning: package 'vcd' was built under R version 3.4.4
## Loading required package: grid
## Loading required package: gnm
## Warning: package 'gnm' was built under R version 3.4.4
##
## Attaching package: 'vcdExtra'
## The following object is masked from 'package:carData':
##
## Burt
library(MASS)
## Warning: package 'MASS' was built under R version 3.4.4
library(effects)
## Warning: package 'effects' was built under R version 3.4.4
## lattice theme set by effectsTheme()
## See ?effectsTheme for details.
library(nnet)
## Warning: package 'nnet' was built under R version 3.4.4
library(stats)
Vtn_tab = xtabs(Freq ~ ., data = Vietnam)
cotabplot( ~ year + response | sex, data = Vtn_tab, gp = shading_Marimekko(Vtn_tab[1,,]))
From these plots, we can conclude that there is linear relationship between the year and response. This relationship is very strong for men in the C&D category unlike for female students. For female population, there is no strong correlation from the graphs.
library(MASS)
Vietnam$response = ordered(Vietnam$response)
Vtn_polr = polr(response ~ year * sex, data = Vietnam, weights = Freq, Hess = TRUE)
summary(Vtn_polr)
## Call:
## polr(formula = response ~ year * sex, data = Vietnam, weights = Freq,
## Hess = TRUE)
##
## Coefficients:
## Value Std. Error t value
## year 0.1008 0.05481 1.840
## sexMale -1.4367 0.22301 -6.442
## year:sexMale 0.2333 0.06025 3.873
##
## Intercepts:
## Value Std. Error t value
## A|B -1.3533 0.2059 -6.5737
## B|C -0.2538 0.2041 -1.2437
## C|D 2.1879 0.2104 10.3968
##
## Residual Deviance: 7757.059
## AIC: 7769.059
t-value is positive for C|D indicates a strong correlation in the model. We cant tell the model is significant or not without any p-values. so we use Anova() function
Anova(Vtn_polr)
## Analysis of Deviance Table (Type II tests)
##
## Response: response
## LR Chisq Df Pr(>Chisq)
## year 166.510 1 < 2.2e-16 ***
## sex 58.473 1 2.061e-14 ***
## year:sex 15.041 1 0.0001052 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the type II tests, we can conclude that all the co-efficients are not significant in the interaction of year with sex.
library(VGAM)
## Warning: package 'VGAM' was built under R version 3.4.4
## Loading required package: stats4
## Loading required package: splines
##
## Attaching package: 'VGAM'
## The following object is masked from 'package:car':
##
## logit
Vtn_PO = vglm(response ~ year * sex, data = Vietnam, weights = Freq, family = cumulative(parallel = TRUE))
Vtn_NPO = vglm(response ~ year * sex, data = Vietnam, weights = Freq, family = cumulative(parallel = FALSE))
lrtest(Vtn_NPO, Vtn_PO)
## Likelihood ratio test
##
## Model 1: response ~ year * sex
## Model 2: response ~ year * sex
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 108 -3839.7
## 2 114 -3878.5 6 77.57 1.134e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In the likelihood test, I am comparing the two models: proportional and non-proportional data. We can conclude that the unconstrained model fits better than the proportional odd model from the Chisq values.
library(nnet)
Vtn_mtn = multinom(response ~ year * sex, data = Vietnam, weights = Freq)
## # weights: 20 (12 variable)
## initial value 4362.668354
## iter 10 value 3927.047674
## final value 3838.854903
## converged
summary(Vtn_mtn)
## Call:
## multinom(formula = response ~ year * sex, data = Vietnam, weights = Freq)
##
## Coefficients:
## (Intercept) year sexMale year:sexMale
## B 0.3928759 -0.0005171856 -0.9719739 0.1469216
## C 1.1452464 0.1475132943 -1.8898266 0.1708179
## D -1.3100155 0.1908925232 -2.0083782 0.4550651
##
## Std. Errors:
## (Intercept) year sexMale year:sexMale
## B 0.3914932 0.11020875 0.4105455 0.11627237
## C 0.3368612 0.09373079 0.3572585 0.09985147
## D 0.5677546 0.15097871 0.6163987 0.16217684
##
## Residual Deviance: 7677.71
## AIC: 7701.71
Now applying Anova() to find the best model fit
Anova(Vtn_mtn)
## Analysis of Deviance Table (Type II tests)
##
## Response: response
## LR Chisq Df Pr(>Chisq)
## year 175.884 3 < 2e-16 ***
## sex 137.627 3 < 2e-16 ***
## year:sex 7.751 3 0.05145 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the above results, we can conclude that there is significant coefficient of 0.05 between the interaction of year with sex.
library(carData)
library(effects)
plot(allEffects(Vtn_polr), style = "stacked")
From these plots we can conclude that female students are less prone to violent options compared to the men students. The option A has high preference among men compared to female population
Vietnam = within(Vietnam, yearMale <- year * (sex == "Male"))
Vtn_mtn2 = multinom(response ~ sex + yearMale, data = Vietnam, weights = Freq)
## # weights: 16 (9 variable)
## initial value 4362.668354
## iter 10 value 3894.170452
## final value 3841.568268
## converged
Anova(Vtn_mtn2)
## Analysis of Deviance Table (Type II tests)
##
## Response: response
## LR Chisq Df Pr(>Chisq)
## sex 242.80 3 < 2.2e-16 ***
## yearMale 178.21 3 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(Vtn_mtn, Vtn_mtn2)
## Likelihood ratio tests of Multinomial Models
##
## Response: response
## Model Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## 1 sex + yearMale 111 7683.137
## 2 year * sex 108 7677.710 1 vs 2 3 5.42673 0.1430872
Based on the anova () results, I would say this model is not significant worse than the general interaction model due to chi value of 0.14