library(vcd)
## Warning: package 'vcd' was built under R version 3.5.1
## Loading required package: grid
library(vcdExtra)
## Warning: package 'vcdExtra' was built under R version 3.5.1
## Loading required package: gnm
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.5.1
library(logmult)
## Warning: package 'logmult' was built under R version 3.5.1
##
## Attaching package: 'logmult'
## The following object is masked from 'package:gnm':
##
## se
## The following object is masked from 'package:vcd':
##
## assoc
library(ca)
## Warning: package 'ca' was built under R version 3.5.1
library(MASS)
## Warning: package 'MASS' was built under R version 3.5.1
library(car)
## Warning: package 'car' was built under R version 3.5.1
## Loading required package: carData
##
## Attaching package: 'carData'
## The following object is masked from 'package:vcdExtra':
##
## Burt
library(VGAM)
## Warning: package 'VGAM' was built under R version 3.5.1
## Loading required package: stats4
## Loading required package: splines
##
## Attaching package: 'VGAM'
## The following object is masked from 'package:car':
##
## logit
library(rms)
## Warning: package 'rms' was built under R version 3.5.1
## Loading required package: Hmisc
## Loading required package: lattice
##
## Attaching package: 'lattice'
## The following object is masked from 'package:gnm':
##
## barley
## Loading required package: survival
## Loading required package: Formula
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
##
## format.pval, units
## Loading required package: SparseM
##
## Attaching package: 'SparseM'
## The following object is masked from 'package:base':
##
## backsolve
##
## Attaching package: 'rms'
## The following objects are masked from 'package:VGAM':
##
## calibrate, lrtest
## The following objects are masked from 'package:car':
##
## Predict, vif
library(reshape2)
## Warning: package 'reshape2' was built under R version 3.5.1
library(directlabels)
## Warning: package 'directlabels' was built under R version 3.5.1
library(effects)
## Warning: package 'effects' was built under R version 3.5.1
## Use the command
## lattice::trellis.par.set(effectsTheme())
## to customize lattice options for effects plots.
## See ?efffectsTheme for details.
library(nnet)
## Warning: package 'nnet' was built under R version 3.5.1
data("Vietnam", package = "vcdExtra")
Vietnam$yearsex<- paste(Vietnam$year, Vietnam$sex, sep = ':')
VN <- xtabs(Freq ~ response + Vietnam$yearsex, data = Vietnam)
viet <- margin.table(VN, 1:2)
viet.ca <- ca(viet)
plot(viet.ca)

Homework #7
Model 1:
head(Vietnam$response, 8)
## [1] A B C D A B C D
## Levels: A B C D
viet.polr <- polr(response ~ year + sex, data = Vietnam)
summary(viet.polr)
##
## Re-fitting to get Hessian
## Call:
## polr(formula = response ~ year + sex, data = Vietnam)
##
## Coefficients:
## Value Std. Error t value
## year -4.786e-15 0.2000 -2.393e-14
## sexMale -1.244e-15 0.5657 -2.199e-15
##
## Intercepts:
## Value Std. Error t value
## A|B -1.0986 0.7572 -1.4509
## B|C 0.0000 0.7348 0.0000
## C|D 1.0986 0.7572 1.4509
##
## Residual Deviance: 110.9035
## AIC: 120.9035
Model 2:
vn.polr <- polr(response ~ Vietnam$year + Vietnam$sex, data = Vietnam)
summary(vn.polr)
##
## Re-fitting to get Hessian
## Call:
## polr(formula = response ~ Vietnam$year + Vietnam$sex, data = Vietnam)
##
## Coefficients:
## Value Std. Error t value
## Vietnam$year -4.786e-15 0.2000 -2.393e-14
## Vietnam$sexMale -1.244e-15 0.5657 -2.199e-15
##
## Intercepts:
## Value Std. Error t value
## A|B -1.0986 0.7572 -1.4509
## B|C 0.0000 0.7348 0.0000
## C|D 1.0986 0.7572 1.4509
##
## Residual Deviance: 110.9035
## AIC: 120.9035
Since models, 1 and 2, are the same, we will work through the exercise using only Model 1:
(a) Fit the proportional odds model to these data, allowing an interaction of year and sex:
Let’s do an analysis of the relationship between Year and Sex:
Anova(viet.polr)
## Analysis of Deviance Table (Type II tests)
##
## Response: response
## LR Chisq Df Pr(>Chisq)
## year 0 1 1
## sex 0 1 1
Since chisq is 0, we know year and sex are related.
(b) Is there evidence that the proportional odds assumption does not hold for this data set? Use the motheods described in “Ordinal response: Proportional odds model” to assess this.
The proportional odds assumption assumes the slope of the line is fixed and does not change for each new value. Hence, we will have to test the proportional odds assumption. If the line’s slope changes for each new value, it is the “Nested Proportional odd” (NPO). If the line’s slope doesn’t change it would be PO.
Let us assume that our model isn’t proportional, and build it. Then, we will build a model assuming that the two are proportional – or vice versa
(1) Let’s, first, create the model assuming the odds are proportional:
vietpo <- vglm(response ~ year + sex, data = Vietnam, family = cumulative(parallel = TRUE))
## Warning in eval(slot(family, "initialize")): response should be ordinal---
## see ordered()
vietpo
##
## Call:
## vglm(formula = response ~ year + sex, family = cumulative(parallel = TRUE),
## data = Vietnam)
##
##
## Coefficients:
## (Intercept):1 (Intercept):2 (Intercept):3 year sexMale
## -1.098612e+00 -6.879800e-16 1.098612e+00 -8.881784e-17 1.256074e-16
##
## Degrees of Freedom: 120 Total; 115 Residual
## Residual deviance: 110.9035
## Log-likelihood: -55.45177
Here, the log-likelihood ratio is not close the 0. This implies that the year and sex are not related. We will have to search further.
(2) Create a model assuming odds are not proportional:
vietnpo <- vglm(response ~ year + sex, data = Vietnam, family = cumulative(parallel = FALSE))
## Warning in eval(slot(family, "initialize")): response should be ordinal---
## see ordered()
vietnpo
##
## Call:
## vglm(formula = response ~ year + sex, family = cumulative(parallel = FALSE),
## data = Vietnam)
##
##
## Coefficients:
## (Intercept):1 (Intercept):2 (Intercept):3 year:1 year:2
## -1.098612e+00 -6.879800e-16 1.098612e+00 4.634791e-16 6.326856e-17
## year:3 sexMale:1 sexMale:2 sexMale:3
## 8.599751e-17 -3.918388e-16 -1.133916e-15 -8.918715e-16
##
## Degrees of Freedom: 120 Total; 111 Residual
## Residual deviance: 110.9035
## Log-likelihood: -55.45177
It seems the slope is changing here for each new value. So, we have an NPO and the models appear to be different.
Let’s look at an easier way to compare the values:
coef(vietpo, matrix = TRUE)
## logit(P[Y<=1]) logit(P[Y<=2]) logit(P[Y<=3])
## (Intercept) -1.098612e+00 -6.879800e-16 1.098612e+00
## year -8.881784e-17 -8.881784e-17 -8.881784e-17
## sexMale 1.256074e-16 1.256074e-16 1.256074e-16
coef(vietnpo, matrix = TRUE)
## logit(P[Y<=1]) logit(P[Y<=2]) logit(P[Y<=3])
## (Intercept) -1.098612e+00 -6.879800e-16 1.098612e+00
## year 4.634791e-16 6.326856e-17 8.599751e-17
## sexMale -3.918388e-16 -1.133916e-15 -8.918715e-16
It seems the slope is changing for each ew value. Hence, the models are not the same.
Graphical assessment:
op <- par(mfrow = c(1,3))
plot.xmean.ordinaly(response ~ year + sex, data = Vietnam, lwd = 2, pch = 16, subn = FALSE)

(c) Fit the multinomial logistic model (generalized logit model), also allowing an interaction. Use car::Anova() to assess the model terms. Here, we append the fitted response probabilities (or logits) to the data frame and plotting these in relation to the predictors
Visualization: Full-model plot
viet.fitp <- cbind(Vietnam, predict(viet.polr, type = "probs"))
head(viet.fitp)
## sex year response Freq yearsex A B C D
## 1 Female 1 A 13 1:Female 0.25 0.25 0.25 0.25
## 2 Female 1 B 19 1:Female 0.25 0.25 0.25 0.25
## 3 Female 1 C 40 1:Female 0.25 0.25 0.25 0.25
## 4 Female 1 D 5 1:Female 0.25 0.25 0.25 0.25
## 5 Female 2 A 5 2:Female 0.25 0.25 0.25 0.25
## 6 Female 2 B 9 2:Female 0.25 0.25 0.25 0.25
plotdat <- melt(viet.fitp, id.vars= c("year","sex", "response"), measure.vars= c("A", "B", "C", "D"), variable.name= "Level", value.name= "Probability")
head(plotdat)
## year sex response Level Probability
## 1 1 Female A A 0.25
## 2 1 Female B A 0.25
## 3 1 Female C A 0.25
## 4 1 Female D A 0.25
## 5 2 Female A A 0.25
## 6 2 Female B A 0.25
Let’s use car::Anova() to assess the model terms. Here, we measure the association with these data.
car::Anova(viet.polr)
## Analysis of Deviance Table (Type II tests)
##
## Response: response
## LR Chisq Df Pr(>Chisq)
## year 0 1 1
## sex 0 1 1
We see above that since chisq equals zero, there seems to be an association between the models.
gg <- ggplot(plotdat, aes (year, y= Probability, color = Level))+ ylim(0.0, 1.0) + geom_line(size= 5.0) + theme_bw()+ xlim(10, 100)+ geom_point(color= "black", size= 3.0)+ facet_grid(year ~ sex)
direct.label(gg)
## Warning: Removed 160 rows containing missing values (geom_path).
## Warning: Removed 160 rows containing missing values (geom_point).

(d) Produce an effect plot for this model and describe the nature of the ingteraction
Visualization: Effects Plot
plot(Effect("year", viet.polr))
##
## Re-fitting to get Hessian

plot(Effect("sex", viet.polr), style = "stacked", key.args = list(x = .45, y = .8))
##
## Re-fitting to get Hessian

Note: it is very difficult to determine the nature of the interaction here. It seems ther is no interaction.
plot(Effect("year", viet.polr), style = "stacked", key.args = list(x = .45, y = .8))
##
## Re-fitting to get Hessian

Note: Again,it is very difficult to determine the nature of the interaction here.
(e) Fit the simpler multinomial model in which there is no effect of year for females and the effect of year is linear for males (on the logit scale). Test whether this model is significantly worse than the general multinomial model with interaction.
(I am not so sure about this one)
(1) Let us look at the generalized logit model:
viet.multinom <- multinom(response ~ year + sex, data = Vietnam, Hess = TRUE)
## # weights: 16 (9 variable)
## initial value 55.451774
## final value 55.451774
## converged
summary(viet.multinom)
## Call:
## multinom(formula = response ~ year + sex, data = Vietnam, Hess = TRUE)
##
## Coefficients:
## (Intercept) year sexMale
## B 0 0 0
## C 0 0 0
## D 0 0 0
##
## Std. Errors:
## (Intercept) year sexMale
## B 1.140175 0.3162278 0.8944272
## C 1.140175 0.3162278 0.8944272
## D 1.140175 0.3162278 0.8944272
##
## Residual Deviance: 110.9035
## AIC: 128.9035
Above, there is no effect on of year for females
(2) Let’s use the effect of interaction instead of additive:
viet.multinom2 <- multinom(response ~ year * sex, data = Vietnam, Hess = TRUE)
## # weights: 20 (12 variable)
## initial value 55.451774
## final value 55.451774
## converged
summary(viet.multinom2)
## Call:
## multinom(formula = response ~ year * sex, data = Vietnam, Hess = TRUE)
##
## Coefficients:
## (Intercept) year sexMale year:sexMale
## B 0 0 0 0
## C 0 0 0 0
## D 0 0 0 0
##
## Std. Errors:
## (Intercept) year sexMale year:sexMale
## B 1.48324 0.4472136 2.097618 0.6324555
## C 1.48324 0.4472136 2.097618 0.6324555
## D 1.48324 0.4472136 2.097618 0.6324555
##
## Residual Deviance: 110.9035
## AIC: 134.9035
At this point, it seems model 1 is not significantly worse than model 2 (with interaction)
Let’s use Anova:
Anova(viet.multinom)
## Analysis of Deviance Table (Type II tests)
##
## Response: response
## LR Chisq Df Pr(>Chisq)
## year 0 3 1
## sex 0 3 1
Anova(viet.multinom2)
## Analysis of Deviance Table (Type II tests)
##
## Response: response
## LR Chisq Df Pr(>Chisq)
## year 0 3 1
## sex 0 3 1
## year:sex 0 3 1