a 2 X 2 X 2 table in frequency data frame form.
library(stats)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.5.1
library(vcd)
## Warning: package 'vcd' was built under R version 3.5.1
## Loading required package: grid
library(vcdExtra)
## Warning: package 'vcdExtra' was built under R version 3.5.1
## Loading required package: gnm
library(ca)
## Warning: package 'ca' was built under R version 3.5.1
library(logmult)
## Warning: package 'logmult' was built under R version 3.5.1
##
## Attaching package: 'logmult'
## The following object is masked from 'package:gnm':
##
## se
## The following object is masked from 'package:vcd':
##
## assoc
library(MASS)
## Warning: package 'MASS' was built under R version 3.5.1
Look at the table of the Dayton.ACM:
Dayton.ACM <- aggregate(Freq ~ cigarette + alcohol + marijuana, data = DaytonSurvey, FUN = sum)
Dayton.ACM
## cigarette alcohol marijuana Freq
## 1 Yes Yes Yes 911
## 2 No Yes Yes 44
## 3 Yes No Yes 3
## 4 No No Yes 2
## 5 Yes Yes No 538
## 6 No Yes No 456
## 7 Yes No No 43
## 8 No No No 279
Find the following models:
(a) Joint independence, [AC][M]:
joint.ind <- aggregate(Freq ~ alcohol*cigarette + marijuana, data = DaytonSurvey, FUN = sum)
joint.ind
## alcohol cigarette marijuana Freq
## 1 Yes Yes Yes 911
## 2 No Yes Yes 3
## 3 Yes No Yes 44
## 4 No No Yes 2
## 5 Yes Yes No 538
## 6 No Yes No 43
## 7 Yes No No 456
## 8 No No No 279
(b) Conditional independence, [AM][CM]:
condition.ind <- aggregate(Freq ~ marijuana*(cigarette + alcohol), data = DaytonSurvey, FUN = sum)
condition.ind
## marijuana cigarette alcohol Freq
## 1 Yes Yes Yes 911
## 2 No Yes Yes 538
## 3 Yes No Yes 44
## 4 No No Yes 456
## 5 Yes Yes No 3
## 6 No Yes No 43
## 7 Yes No No 2
## 8 No No No 279
(c) Homogenous model, [AC][AM][CM]:
homogen.mod <- aggregate(Freq ~ (alcohol + cigarette + marijuana)^2, data = DaytonSurvey, FUN = sum)
homogen.mod
## alcohol cigarette marijuana Freq
## 1 Yes Yes Yes 911
## 2 No Yes Yes 3
## 3 Yes No Yes 44
## 4 No No Yes 2
## 5 Yes Yes No 538
## 6 No Yes No 43
## 7 Yes No No 456
## 8 No No No 279
(d) Prepare a table giving the goodness-of-fit tests for these models, as well as the model of mutual independence, [A][C][M], and the standard model, [ACM]. Hint: anova() is useful here. Which model appears to give the most reasonable fit? Explain your interpretation.
Use goodnee-of-fit tests on all models:
(1) Joint independence, [AC][M]:
Dayton.loglm0 <- loglm(Freq ~ alcohol*cigarette + marijuana, data = DaytonSurvey, param = TRUE, fitted = TRUE)
Dayton.loglm0
## Call:
## loglm(formula = Freq ~ alcohol * cigarette + marijuana, data = DaytonSurvey,
## param = TRUE, fitted = TRUE)
##
## Statistics:
## X^2 df P(> X^2)
## Likelihood Ratio 2810.417 27 0
## Pearson 2869.945 27 0
(2) Conditional independence, [AM][CM]:
Dayton.loglm1 <- loglm(Freq ~ marijuana*(cigarette + alcohol), data = DaytonSurvey, param = TRUE, fitted = TRUE)
Dayton.loglm1
## Call:
## loglm(formula = Freq ~ marijuana * (cigarette + alcohol), data = DaytonSurvey,
## param = TRUE, fitted = TRUE)
##
## Statistics:
## X^2 df P(> X^2)
## Likelihood Ratio 2154.344 26 0
## Pearson 1974.994 26 0
(3) Homogenous model, [AC][AM][CM]:
Dayton.loglm2 <- loglm(Freq ~ (alcohol + cigarette + marijuana)^2, data = DaytonSurvey, param = TRUE, fitted = TRUE)
Dayton.loglm2
## Call:
## loglm(formula = Freq ~ (alcohol + cigarette + marijuana)^2, data = DaytonSurvey,
## param = TRUE, fitted = TRUE)
##
## Statistics:
## X^2 df P(> X^2)
## Likelihood Ratio 1966.964 25 0
## Pearson 1667.735 25 0
(4) mutual independence, [A][C][M]:
Dayton.loglm3 <- loglm(Freq ~ alcohol + cigarette + marijuana, data = DaytonSurvey, param = TRUE, fitted = TRUE)
Dayton.loglm3
## Call:
## loglm(formula = Freq ~ alcohol + cigarette + marijuana, data = DaytonSurvey,
## param = TRUE, fitted = TRUE)
##
## Statistics:
## X^2 df P(> X^2)
## Likelihood Ratio 3252.610 28 0
## Pearson 4055.973 28 0
(5) saturated model, [ACM]:
Dayton.loglm4 <- loglm(Freq ~ alcohol * cigarette * marijuana, data = DaytonSurvey, param = TRUE, fitted = TRUE)
Dayton.loglm4
## Call:
## loglm(formula = Freq ~ alcohol * cigarette * marijuana, data = DaytonSurvey,
## param = TRUE, fitted = TRUE)
##
## Statistics:
## X^2 df P(> X^2)
## Likelihood Ratio 1966.590 24 0
## Pearson 1666.879 24 0
Let’s use anova to see which model appears to give the most reasonable fit:
anova(Dayton.loglm0, Dayton.loglm1, Dayton.loglm2, Dayton.loglm3, Dayton.loglm4, test = "Chisq")
## LR tests for hierarchical log-linear models
##
## Model 1:
## Freq ~ alcohol + cigarette + marijuana
## Model 2:
## Freq ~ alcohol * cigarette + marijuana
## Model 3:
## Freq ~ marijuana * (cigarette + alcohol)
## Model 4:
## Freq ~ (alcohol + cigarette + marijuana)^2
## Model 5:
## Freq ~ alcohol * cigarette * marijuana
##
## Deviance df Delta(Dev) Delta(df) P(> Delta(Dev)
## Model 1 3252.610 28
## Model 2 2810.417 27 442.1933108 1 0.00000
## Model 3 2154.344 26 656.0723408 1 0.00000
## Model 4 1966.964 25 187.3803170 1 0.00000
## Model 5 1966.590 24 0.3739859 1 0.54084
## Saturated 0.000 0 1966.5899596 24 0.00000
Results: Based on the knowledge that G-square is always 0 for saturated models, we see that the closest fit is the saturated model. Albeit, if we excuse the saturated model, then the best fit, otherwise, is our Model #4 (Model 5 in anove), with Model #3 a close second.