Use the Vietnam dataset to fit a model for the polytomous response variable in relation to year and sex. The response variable is the answer type (A, B, C, D) that each person selected while filling in the survey regarding the vietnam war.
data("Vietnam",package="vcdExtra")
str(Vietnam)
## 'data.frame': 40 obs. of 4 variables:
## $ sex : Factor w/ 2 levels "Female","Male": 1 1 1 1 1 1 1 1 1 1 ...
## $ year : int 1 1 1 1 2 2 2 2 3 3 ...
## $ response: Factor w/ 4 levels "A","B","C","D": 1 2 3 4 1 2 3 4 1 2 ...
## $ Freq : int 13 19 40 5 5 9 33 3 22 29 ...
levels(Vietnam$response) #response variable has three levels
## [1] "A" "B" "C" "D"
head(Vietnam)
## sex year response Freq
## 1 Female 1 A 13
## 2 Female 1 B 19
## 3 Female 1 C 40
## 4 Female 1 D 5
## 5 Female 2 A 5
## 6 Female 2 B 9
A) Fit the proportional odds model to these data, with interaction of year and sex
library(MASS)
Viet.polr=polr(response~year+sex,data = Vietnam,Hess = T)
library(car)
## Warning: package 'car' was built under R version 3.4.4
## Loading required package: carData
## Warning: package 'carData' was built under R version 3.4.4
Anova(Viet.polr)
## Analysis of Deviance Table (Type II tests)
##
## Response: response
## LR Chisq Df Pr(>Chisq)
## year 0 1 1
## sex 0 1 1
Using PO as the model to fit the data returns a lack of significance for both predictors on response type.
B)Is there evidence that POD assumption does not hold for this data set?
library(VGAM)
## Warning: package 'VGAM' was built under R version 3.4.4
## Loading required package: stats4
## Loading required package: splines
##
## Attaching package: 'VGAM'
## The following object is masked from 'package:car':
##
## logit
viet.po=vglm(response~year+sex,data=Vietnam, family = cumulative(parallel = T))
## Warning in eval(slot(family, "initialize")): response should be ordinal---
## see ordered()
viet.npo=vglm(response~year+sex,data=Vietnam, family = cumulative(parallel = F))
## Warning in eval(slot(family, "initialize")): response should be ordinal---
## see ordered()
coef(viet.po, matrix=T)
## logit(P[Y<=1]) logit(P[Y<=2]) logit(P[Y<=3])
## (Intercept) -1.098612e+00 -6.879800e-16 1.098612e+00
## year -8.881784e-17 -8.881784e-17 -8.881784e-17
## sexMale 1.256074e-16 1.256074e-16 1.256074e-16
coef(viet.npo,matrix=T)
## logit(P[Y<=1]) logit(P[Y<=2]) logit(P[Y<=3])
## (Intercept) -1.098612e+00 -6.879800e-16 1.098612e+00
## year 4.634791e-16 6.326856e-17 8.599751e-17
## sexMale -3.918388e-16 -1.133916e-15 -8.918715e-16
lrtest(viet.npo,viet.po)
## Likelihood ratio test
##
## Model 1: response ~ year + sex
## Model 2: response ~ year + sex
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 111 -55.452
## 2 115 -55.452 4 0 1
Since coefficients differ for both PO and NPO, wePOD model is not a good fit for the vietnam survey responses.
C)Fit the multinomial logistic Model allowing an intercation and using ANova to assess model
library(nnet)
viet.multinom= multinom(response ~ year + sex,
data = Vietnam, Hess = TRUE)
## # weights: 16 (9 variable)
## initial value 55.451774
## final value 55.451774
## converged
Anova(viet.multinom)
## Analysis of Deviance Table (Type II tests)
##
## Response: response
## LR Chisq Df Pr(>Chisq)
## year 0 3 1
## sex 0 3 1
D)Produce effect plots for this model
Effect of sex in Multinomial model shows that response type do not differ regarding sex.
library(effects)
## Warning: package 'effects' was built under R version 3.4.4
## lattice theme set by effectsTheme()
## See ?effectsTheme for details.
plot(Effect("sex", viet.multinom))
plot(Effect("sex", viet.multinom), style = "stacked",
key.args = list(x = .55, y = .9))
Effect for the year and sex in multinom model show no difference in response type regarding sex and year.
plot(Effect(c("year","sex"), viet.multinom),
style = "stacked", key.arg = list(x = .8, y = .9))
The Proportional odds model assumtion did not hold true for the vietnam data set. The generic multinomial model did not show any significance at all and effect plots confirmed the lack of fit when year and sex interact.
The interaciton of year and sex did not produce a statistically significant relationship with the response variable.