library(vcd)
library(vcdExtra)
library(MASS)

1 Exercise 9.2

Consider the data set DaytonSurvey (described in Example 2.6), giving results of a survey of use of alcohol (A), cigarettes (C), and marijuana (M) among high school seniors. For this exercise, ignore the variables sex and race, by working with the marginal table Dayton.ACM, a 2 × 2 × 2 table in frequency data frame form.

data(DaytonSurvey)
Dayton.ACM <- aggregate(Freq ~ cigarette + alcohol + marijuana, data = DaytonSurvey, FUN = sum)

1.1 Part A.

Find the model of Joint independence, [AC][M].

JI <- glm(Freq ~ alcohol*cigarette + marijuana, data = DaytonSurvey, family = "poisson")
JI
## 
## Call:  glm(formula = Freq ~ alcohol * cigarette + marijuana, family = "poisson", 
##     data = DaytonSurvey)
## 
## Coefficients:
##           (Intercept)              alcoholNo            cigaretteNo  
##                5.0291                -3.4500                -1.0640  
##           marijuanaNo  alcoholNo:cigaretteNo  
##                0.3154                 2.8737  
## 
## Degrees of Freedom: 31 Total (i.e. Null);  27 Residual
## Null Deviance:       4818 
## Residual Deviance: 2810  AIC: 2957

1.2 Part B.

Find the model of Conditional independence [AM][CM].

CI <- glm(Freq ~ alcohol*marijuana + cigarette*marijuana, data = DaytonSurvey, family = "poisson")
CI
## 
## Call:  glm(formula = Freq ~ alcohol * marijuana + cigarette * marijuana, 
##     family = "poisson", data = DaytonSurvey)
## 
## Coefficients:
##             (Intercept)                alcoholNo              marijuanaNo  
##                  5.4263                  -5.2523                  -0.7285  
##             cigaretteNo    alcoholNo:marijuanaNo  marijuanaNo:cigaretteNo  
##                 -2.9892                   4.1251                   3.2243  
## 
## Degrees of Freedom: 31 Total (i.e. Null);  26 Residual
## Null Deviance:       4818 
## Residual Deviance: 2154  AIC: 2303

1.3 Part C.

Find the Homogenous model [AC][AM][CM].

HM <- glm(Freq ~ (alcohol + cigarette + marijuana)^2 , data = DaytonSurvey, family = "poisson")
HM
## 
## Call:  glm(formula = Freq ~ (alcohol + cigarette + marijuana)^2, family = "poisson", 
##     data = DaytonSurvey)
## 
## Coefficients:
##             (Intercept)                alcoholNo              cigaretteNo  
##                  5.4276                  -5.5283                  -3.0158  
##             marijuanaNo    alcoholNo:cigaretteNo    alcoholNo:marijuanaNo  
##                 -0.5249                   2.0545                   2.9860  
## cigaretteNo:marijuanaNo  
##                  2.8479  
## 
## Degrees of Freedom: 31 Total (i.e. Null);  25 Residual
## Null Deviance:       4818 
## Residual Deviance: 1967  AIC: 2117

1.4 Part D.

Find the model of Mutual independence, [A][C][M].

MI <- glm(Freq ~ alcohol + cigarette + marijuana, data = DaytonSurvey, family = "poisson")
MI
## 
## Call:  glm(formula = Freq ~ alcohol + cigarette + marijuana, family = "poisson", 
##     data = DaytonSurvey)
## 
## Coefficients:
## (Intercept)    alcoholNo  cigaretteNo  marijuanaNo  
##      4.9052      -1.7851      -0.6493       0.3154  
## 
## Degrees of Freedom: 31 Total (i.e. Null);  28 Residual
## Null Deviance:       4818 
## Residual Deviance: 3253  AIC: 3397

1.5 Part E.

Find the saturated model, [ACM].

SM <- glm(Freq ~ alcohol * cigarette * marijuana, data = DaytonSurvey, family = "poisson")
SM
## 
## Call:  glm(formula = Freq ~ alcohol * cigarette * marijuana, family = "poisson", 
##     data = DaytonSurvey)
## 
## Coefficients:
##                       (Intercept)                          alcoholNo  
##                            5.4282                            -5.7159  
##                       cigaretteNo                        marijuanaNo  
##                           -3.0304                            -0.5267  
##             alcoholNo:cigaretteNo              alcoholNo:marijuanaNo  
##                            2.6249                             3.1893  
##           cigaretteNo:marijuanaNo  alcoholNo:cigaretteNo:marijuanaNo  
##                            2.8650                            -0.5895  
## 
## Degrees of Freedom: 31 Total (i.e. Null);  24 Residual
## Null Deviance:       4818 
## Residual Deviance: 1967  AIC: 2119

1.6 Part F.

Prepare a table giving the goodness-of-fit tests for these models Which model appears to give the most reasonable fit? Explain your interpretation.

anova(JI, CI, HM, MI, SM, test = "Chisq")
LRstats(JI, CI, HM, MI, SM)

From the Pearson’s chi-squared test (X2) and the Likelihood ratio (G2) test, it is clear that the model of mutual independence is the worst, clearly indicating that there is some association between the usage of alcohol, cigarettes, and marijuana. It is difficult to pinpoint the best model as saturated model and the homogeneous model are equally good. While the saturated model has a lower residual deviance, the homogeneous model has lower Akaike information criterion and Bayesian information criterion.