Exercise 9.2 Consider the data set DaytonSurvey (described in Example 2.6), giving results of a survey of use of alcohol (A), cigarettes (C), and marijuana (M) among high school seniors. For this exercise, ignore the variables sex and race, by working with the marginal table Dayton.ACM, a 2 × 2 × 2 table in frequency data frame form.

library(vcdExtra)
## Warning: package 'vcdExtra' was built under R version 3.4.4
## Loading required package: vcd
## Warning: package 'vcd' was built under R version 3.4.4
## Loading required package: grid
## Loading required package: gnm
## Warning: package 'gnm' was built under R version 3.4.4
data("DaytonSurvey",package="vcdExtra")
Dayton.ACM = aggregate(Freq~cigarette+alcohol+marijuana,data=DaytonSurvey, FUN=sum)
Dayton.ACM 
##   cigarette alcohol marijuana Freq
## 1       Yes     Yes       Yes  911
## 2        No     Yes       Yes   44
## 3       Yes      No       Yes    3
## 4        No      No       Yes    2
## 5       Yes     Yes        No  538
## 6        No     Yes        No  456
## 7       Yes      No        No   43
## 8        No      No        No  279

Find the following models: ??? (a) Joint independence, [AC][M].

library(MASS)
## Warning: package 'MASS' was built under R version 3.4.4
Dayton.JI=loglm(Freq~alcohol*cigarette+marijuana, data = Dayton.ACM)
Dayton.JI
## Call:
## loglm(formula = Freq ~ alcohol * cigarette + marijuana, data = Dayton.ACM)
## 
## Statistics:
##                       X^2 df P(> X^2)
## Likelihood Ratio 843.8266  3        0
## Pearson          704.9071  3        0

??? (b) Conditional independence [AM][CM]

Dayton.CI=loglm(Freq~marijuana*(alcohol+cigarette),data=Dayton.ACM)
Dayton.CI
## Call:
## loglm(formula = Freq ~ marijuana * (alcohol + cigarette), data = Dayton.ACM)
## 
## Statistics:
##                       X^2 df P(> X^2)
## Likelihood Ratio 187.7543  2        0
## Pearson          177.6149  2        0

??? (c) Homogenous model [AC][AM][CM].

Dayton.HM=loglm(Freq~(alcohol+cigarette+marijuana)^2,data=Dayton.ACM)
Dayton.HM
## Call:
## loglm(formula = Freq ~ (alcohol + cigarette + marijuana)^2, data = Dayton.ACM)
## 
## Statistics:
##                        X^2 df  P(> X^2)
## Likelihood Ratio 0.3739859  1 0.5408396
## Pearson          0.4010998  1 0.5265218

??? (d) Prepare a table giving the goodness-of-fit tests for these models, as well as the model of mutual independence, [A][C][M], and the saturated model, [ACM]. Hint: anova() is useful here. Which model appears to give the most reasonable fit? Explain your interpretation.

Mutual independence [A][C][M]:

Dayton.MI=loglm(Freq~alcohol+cigarette+marijuana,data=Dayton.ACM)
Dayton.MI
## Call:
## loglm(formula = Freq ~ alcohol + cigarette + marijuana, data = Dayton.ACM)
## 
## Statistics:
##                       X^2 df P(> X^2)
## Likelihood Ratio 1286.020  4        0
## Pearson          1411.386  4        0

Saturated model [ACM]:

Dayton.SM=loglm(Freq~(alcohol*cigarette*marijuana),data=Dayton.ACM)
Dayton.SM
## Call:
## loglm(formula = Freq ~ (alcohol * cigarette * marijuana), data = Dayton.ACM)
## 
## Statistics:
##                  X^2 df P(> X^2)
## Likelihood Ratio   0  0        1
## Pearson            0  0        1
anova(Dayton.JI,Dayton.CI,Dayton.HM,Dayton.MI,Dayton.SM, test = "chisq")
## LR tests for hierarchical log-linear models
## 
## Model 1:
##  Freq ~ alcohol + cigarette + marijuana 
## Model 2:
##  Freq ~ alcohol * cigarette + marijuana 
## Model 3:
##  Freq ~ marijuana * (alcohol + cigarette) 
## Model 4:
##  Freq ~ (alcohol + cigarette + marijuana)^2 
## Model 5:
##  Freq ~ (alcohol * cigarette * marijuana) 
## 
##               Deviance df  Delta(Dev) Delta(df) P(> Delta(Dev)
## Model 1   1286.0199544  4                                     
## Model 2    843.8266437  3 442.1933108         1        0.00000
## Model 3    187.7543029  2 656.0723408         1        0.00000
## Model 4      0.3739859  1 187.3803170         1        0.00000
## Model 5      0.0000000  0   0.3739859         1        0.54084
## Saturated    0.0000000  0   0.0000000         0        1.00000

THe mutual independence model shows the best fit since it has the lowest pearson chi square and higher X^2 than the other models.