Exercise 9.2 Consider the data set DaytonSurvey (described in Example 2.6), giving results of a survey of use of alcohol (A), cigarettes (C), and marijuana (M) among high school seniors. For this exercise, ignore the variables sex and race, by working with the marginal table Dayton.ACM, a 2 × 2 × 2 table in frequency data frame form.
library(vcdExtra)
## Warning: package 'vcdExtra' was built under R version 3.4.4
## Loading required package: vcd
## Warning: package 'vcd' was built under R version 3.4.4
## Loading required package: grid
## Loading required package: gnm
## Warning: package 'gnm' was built under R version 3.4.4
data("DaytonSurvey",package="vcdExtra")
Dayton.ACM = aggregate(Freq~cigarette+alcohol+marijuana,data=DaytonSurvey, FUN=sum)
Dayton.ACM
## cigarette alcohol marijuana Freq
## 1 Yes Yes Yes 911
## 2 No Yes Yes 44
## 3 Yes No Yes 3
## 4 No No Yes 2
## 5 Yes Yes No 538
## 6 No Yes No 456
## 7 Yes No No 43
## 8 No No No 279
Find the following models: ??? (a) Joint independence, [AC][M].
library(MASS)
## Warning: package 'MASS' was built under R version 3.4.4
Dayton.JI=loglm(Freq~alcohol*cigarette+marijuana, data = Dayton.ACM)
Dayton.JI
## Call:
## loglm(formula = Freq ~ alcohol * cigarette + marijuana, data = Dayton.ACM)
##
## Statistics:
## X^2 df P(> X^2)
## Likelihood Ratio 843.8266 3 0
## Pearson 704.9071 3 0
??? (b) Conditional independence [AM][CM]
Dayton.CI=loglm(Freq~marijuana*(alcohol+cigarette),data=Dayton.ACM)
Dayton.CI
## Call:
## loglm(formula = Freq ~ marijuana * (alcohol + cigarette), data = Dayton.ACM)
##
## Statistics:
## X^2 df P(> X^2)
## Likelihood Ratio 187.7543 2 0
## Pearson 177.6149 2 0
??? (c) Homogenous model [AC][AM][CM].
Dayton.HM=loglm(Freq~(alcohol+cigarette+marijuana)^2,data=Dayton.ACM)
Dayton.HM
## Call:
## loglm(formula = Freq ~ (alcohol + cigarette + marijuana)^2, data = Dayton.ACM)
##
## Statistics:
## X^2 df P(> X^2)
## Likelihood Ratio 0.3739859 1 0.5408396
## Pearson 0.4010998 1 0.5265218
??? (d) Prepare a table giving the goodness-of-fit tests for these models, as well as the model of mutual independence, [A][C][M], and the saturated model, [ACM]. Hint: anova() is useful here. Which model appears to give the most reasonable fit? Explain your interpretation.
Mutual independence [A][C][M]:
Dayton.MI=loglm(Freq~alcohol+cigarette+marijuana,data=Dayton.ACM)
Dayton.MI
## Call:
## loglm(formula = Freq ~ alcohol + cigarette + marijuana, data = Dayton.ACM)
##
## Statistics:
## X^2 df P(> X^2)
## Likelihood Ratio 1286.020 4 0
## Pearson 1411.386 4 0
Saturated model [ACM]:
Dayton.SM=loglm(Freq~(alcohol*cigarette*marijuana),data=Dayton.ACM)
Dayton.SM
## Call:
## loglm(formula = Freq ~ (alcohol * cigarette * marijuana), data = Dayton.ACM)
##
## Statistics:
## X^2 df P(> X^2)
## Likelihood Ratio 0 0 1
## Pearson 0 0 1
anova(Dayton.JI,Dayton.CI,Dayton.HM,Dayton.MI,Dayton.SM, test = "chisq")
## LR tests for hierarchical log-linear models
##
## Model 1:
## Freq ~ alcohol + cigarette + marijuana
## Model 2:
## Freq ~ alcohol * cigarette + marijuana
## Model 3:
## Freq ~ marijuana * (alcohol + cigarette)
## Model 4:
## Freq ~ (alcohol + cigarette + marijuana)^2
## Model 5:
## Freq ~ (alcohol * cigarette * marijuana)
##
## Deviance df Delta(Dev) Delta(df) P(> Delta(Dev)
## Model 1 1286.0199544 4
## Model 2 843.8266437 3 442.1933108 1 0.00000
## Model 3 187.7543029 2 656.0723408 1 0.00000
## Model 4 0.3739859 1 187.3803170 1 0.00000
## Model 5 0.0000000 0 0.3739859 1 0.54084
## Saturated 0.0000000 0 0.0000000 0 1.00000
THe mutual independence model shows the best fit since it has the lowest pearson chi square and higher X^2 than the other models.