library(vcd)
library(vcdExtra)
library(MASS)Consider the data set DaytonSurvey (described in Example 2.6), giving results of a survey of use of alcohol (A), cigarettes (C), and marijuana (M) among high school seniors. For this exercise, ignore the variables sex and race, by working with the marginal table Dayton.ACM, a 2 × 2 × 2 table in frequency data frame form.
data(DaytonSurvey)
Dayton.ACM <- aggregate(Freq ~ cigarette + alcohol + marijuana, data = DaytonSurvey, FUN = sum)Find the model of Joint independence, [AC][M].
JI <- glm(Freq ~ alcohol*cigarette + marijuana, data = DaytonSurvey, family = "poisson")
JI##
## Call: glm(formula = Freq ~ alcohol * cigarette + marijuana, family = "poisson",
## data = DaytonSurvey)
##
## Coefficients:
## (Intercept) alcoholNo cigaretteNo
## 5.0291 -3.4500 -1.0640
## marijuanaNo alcoholNo:cigaretteNo
## 0.3154 2.8737
##
## Degrees of Freedom: 31 Total (i.e. Null); 27 Residual
## Null Deviance: 4818
## Residual Deviance: 2810 AIC: 2957
Find the model of Conditional independence [AM][CM].
CI <- glm(Freq ~ alcohol*marijuana + cigarette*marijuana, data = DaytonSurvey, family = "poisson")
CI##
## Call: glm(formula = Freq ~ alcohol * marijuana + cigarette * marijuana,
## family = "poisson", data = DaytonSurvey)
##
## Coefficients:
## (Intercept) alcoholNo marijuanaNo
## 5.4263 -5.2523 -0.7285
## cigaretteNo alcoholNo:marijuanaNo marijuanaNo:cigaretteNo
## -2.9892 4.1251 3.2243
##
## Degrees of Freedom: 31 Total (i.e. Null); 26 Residual
## Null Deviance: 4818
## Residual Deviance: 2154 AIC: 2303
Find the Homogenous model [AC][AM][CM].
HM <- glm(Freq ~ (alcohol + cigarette + marijuana)^2 , data = DaytonSurvey, family = "poisson")
HM##
## Call: glm(formula = Freq ~ (alcohol + cigarette + marijuana)^2, family = "poisson",
## data = DaytonSurvey)
##
## Coefficients:
## (Intercept) alcoholNo cigaretteNo
## 5.4276 -5.5283 -3.0158
## marijuanaNo alcoholNo:cigaretteNo alcoholNo:marijuanaNo
## -0.5249 2.0545 2.9860
## cigaretteNo:marijuanaNo
## 2.8479
##
## Degrees of Freedom: 31 Total (i.e. Null); 25 Residual
## Null Deviance: 4818
## Residual Deviance: 1967 AIC: 2117
Find the model of Mutual independence, [A][C][M].
MI <- glm(Freq ~ alcohol + cigarette + marijuana, data = DaytonSurvey, family = "poisson")
MI##
## Call: glm(formula = Freq ~ alcohol + cigarette + marijuana, family = "poisson",
## data = DaytonSurvey)
##
## Coefficients:
## (Intercept) alcoholNo cigaretteNo marijuanaNo
## 4.9052 -1.7851 -0.6493 0.3154
##
## Degrees of Freedom: 31 Total (i.e. Null); 28 Residual
## Null Deviance: 4818
## Residual Deviance: 3253 AIC: 3397
Find the saturated model, [ACM].
SM <- glm(Freq ~ alcohol * cigarette * marijuana, data = DaytonSurvey, family = "poisson")
SM##
## Call: glm(formula = Freq ~ alcohol * cigarette * marijuana, family = "poisson",
## data = DaytonSurvey)
##
## Coefficients:
## (Intercept) alcoholNo
## 5.4282 -5.7159
## cigaretteNo marijuanaNo
## -3.0304 -0.5267
## alcoholNo:cigaretteNo alcoholNo:marijuanaNo
## 2.6249 3.1893
## cigaretteNo:marijuanaNo alcoholNo:cigaretteNo:marijuanaNo
## 2.8650 -0.5895
##
## Degrees of Freedom: 31 Total (i.e. Null); 24 Residual
## Null Deviance: 4818
## Residual Deviance: 1967 AIC: 2119
Prepare a table giving the goodness-of-fit tests for these models Which model appears to give the most reasonable fit? Explain your interpretation.
anova(JI, CI, HM, MI, SM, test = "Chisq")LRstats(JI, CI, HM, MI, SM)From the Pearson’s chi-squared test (X2) and the Likelihood ratio (G2) test, it is clear that the model of mutual independence is the worst, clearly indicating that there is some association between the usage of alcohol, cigarettes, and marijuana. It is difficult to pinpoint the best model as saturated model and the homogeneous model are equally good. While the saturated model has a lower residual deviance, the homogeneous model has lower Akaike information criterion and Bayesian information criterion.