Exercise 9.2 Consider the data set DaytonSurvey (described in Example 2.6), giving results of a survey of use of alcohol (A), cigarettes (C), and marijuana (M) among high school seniors. For this exercise, ignore the variables sex and race, by working with the marginal table Dayton.ACM, a 2 × 2 × 2 table in frequency data frame form.

library(vcdExtra)
## Warning: package 'vcdExtra' was built under R version 3.4.4
## Loading required package: vcd
## Warning: package 'vcd' was built under R version 3.4.4
## Loading required package: grid
## Loading required package: gnm
## Warning: package 'gnm' was built under R version 3.4.4
data("DaytonSurvey")
Dayton.ACM <- aggregate(Freq ~ cigarette + alcohol + marijuana, data = DaytonSurvey, FUN = sum)
Dayton.ACM
##   cigarette alcohol marijuana Freq
## 1       Yes     Yes       Yes  911
## 2        No     Yes       Yes   44
## 3       Yes      No       Yes    3
## 4        No      No       Yes    2
## 5       Yes     Yes        No  538
## 6        No     Yes        No  456
## 7       Yes      No        No   43
## 8        No      No        No  279

The null hypothesis for joint independence model at 0.05 significance level

Null Hypothesis: The use of marijuana (M) is independent on cigarette(C) & alcohol(A) Alternative Hypothesis : The use of marijuana (M) is dependent on cigarette(C) & alcohol(A)

JAMC<-ftable(Dayton.ACM, row.vars=c(2,1,3))
Jresult<-chisq.test(JAMC)
## Warning in chisq.test(JAMC): Chi-squared approximation may be incorrect
Jresult
## 
##  Pearson's Chi-squared test
## 
## data:  JAMC
## X-squared = 56, df = 49, p-value = 0.2289

From the chi-squared test, the p-value (0.2289) is greater than 0.05, hence we do not reject the null hypothesis

  1. Similar, there should be a conditional independence test for AM & CM

Let’s now test the hypothesis that A and C are conditionally independent given M. To do this, we enter the data for each 2 × 2 table of A × C corresponding to different levels of, M = 0 (No), M = 1(Yes), respectively, then perform independence tests on these tables, and add up the X^2 statistics

library(stats)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.4
library(ca)
## Warning: package 'ca' was built under R version 3.4.4
library(logmult)
## Warning: package 'logmult' was built under R version 3.4.4
## 
## Attaching package: 'logmult'
## The following object is masked from 'package:gnm':
## 
##     se
## The following object is masked from 'package:vcd':
## 
##     assoc
library(MASS)
## Warning: package 'MASS' was built under R version 3.4.4
JindAMC<- aggregate(Freq ~ alcohol*cigarette + marijuana, data = DaytonSurvey, FUN = sum)
JindAMC
##   alcohol cigarette marijuana Freq
## 1     Yes       Yes       Yes  911
## 2      No       Yes       Yes    3
## 3     Yes        No       Yes   44
## 4      No        No       Yes    2
## 5     Yes       Yes        No  538
## 6      No       Yes        No   43
## 7     Yes        No        No  456
## 8      No        No        No  279
CondAMC<-aggregate(Freq ~ marijuana*(cigarette + alcohol), data = DaytonSurvey, FUN = sum)
CondAMC
##   marijuana cigarette alcohol Freq
## 1       Yes       Yes     Yes  911
## 2        No       Yes     Yes  538
## 3       Yes        No     Yes   44
## 4        No        No     Yes  456
## 5       Yes       Yes      No    3
## 6        No       Yes      No   43
## 7       Yes        No      No    2
## 8        No        No      No  279
HomAMC<-aggregate(Freq ~ (alcohol + cigarette + marijuana)^2, data = DaytonSurvey, FUN = sum)
HomAMC
##   alcohol cigarette marijuana Freq
## 1     Yes       Yes       Yes  911
## 2      No       Yes       Yes    3
## 3     Yes        No       Yes   44
## 4      No        No       Yes    2
## 5     Yes       Yes        No  538
## 6      No       Yes        No   43
## 7     Yes        No        No  456
## 8      No        No        No  279
  1. Prepare a table giving the goodness-of-fit tests for these models, as well as the model of mutual independence, [A][C][M], and the standard model, [ACM]. Hint: anova() is useful here. Which model appears to give the most reasonable fit? Explain your interpretation.

We will use goodness-of-fit fests for all the three models and apply anova() function to find the best fit among them

  1. Joint independence model
Jindfit<-loglm(Freq ~ alcohol*cigarette + marijuana, data = DaytonSurvey, param = TRUE, fitted = TRUE)
Jindfit
## Call:
## loglm(formula = Freq ~ alcohol * cigarette + marijuana, data = DaytonSurvey, 
##     param = TRUE, fitted = TRUE)
## 
## Statistics:
##                       X^2 df P(> X^2)
## Likelihood Ratio 2810.417 27        0
## Pearson          2869.945 27        0
  1. Conditional independence model [AC][CM]
Condfit<-loglm(Freq ~ marijuana*(cigarette + alcohol), data = DaytonSurvey, param = TRUE, fitted = TRUE)
Condfit
## Call:
## loglm(formula = Freq ~ marijuana * (cigarette + alcohol), data = DaytonSurvey, 
##     param = TRUE, fitted = TRUE)
## 
## Statistics:
##                       X^2 df P(> X^2)
## Likelihood Ratio 2154.344 26        0
## Pearson          1974.994 26        0
  1. Homogenous model [AC][AM][CM]
Homfit<-loglm(Freq ~ (alcohol + cigarette + marijuana)^2, data = DaytonSurvey, param = TRUE, fitted = TRUE)
Homfit
## Call:
## loglm(formula = Freq ~ (alcohol + cigarette + marijuana)^2, data = DaytonSurvey, 
##     param = TRUE, fitted = TRUE)
## 
## Statistics:
##                       X^2 df P(> X^2)
## Likelihood Ratio 1966.964 25        0
## Pearson          1667.735 25        0
  1. Mutual dependence model [A][M][C]
Mutfit<-loglm(Freq ~ alcohol + cigarette + marijuana, data = DaytonSurvey, param = TRUE, fitted = TRUE) 
Mutfit
## Call:
## loglm(formula = Freq ~ alcohol + cigarette + marijuana, data = DaytonSurvey, 
##     param = TRUE, fitted = TRUE)
## 
## Statistics:
##                       X^2 df P(> X^2)
## Likelihood Ratio 3252.610 28        0
## Pearson          4055.973 28        0
  1. Saturated model [AMC]
Satfit<-loglm(Freq ~ alcohol * cigarette * marijuana, data = DaytonSurvey, param = TRUE, fitted = TRUE)
Satfit
## Call:
## loglm(formula = Freq ~ alcohol * cigarette * marijuana, data = DaytonSurvey, 
##     param = TRUE, fitted = TRUE)
## 
## Statistics:
##                       X^2 df P(> X^2)
## Likelihood Ratio 1966.590 24        0
## Pearson          1666.879 24        0

We will apply anova to these five models to find the reasonable fit

anova(Jindfit,Condfit,Homfit,Mutfit,Satfit, test = "Chisq")
## LR tests for hierarchical log-linear models
## 
## Model 1:
##  Freq ~ alcohol + cigarette + marijuana 
## Model 2:
##  Freq ~ alcohol * cigarette + marijuana 
## Model 3:
##  Freq ~ marijuana * (cigarette + alcohol) 
## Model 4:
##  Freq ~ (alcohol + cigarette + marijuana)^2 
## Model 5:
##  Freq ~ alcohol * cigarette * marijuana 
## 
##           Deviance df   Delta(Dev) Delta(df) P(> Delta(Dev)
## Model 1   3252.610 28                                      
## Model 2   2810.417 27  442.1933108         1        0.00000
## Model 3   2154.344 26  656.0723408         1        0.00000
## Model 4   1966.964 25  187.3803170         1        0.00000
## Model 5   1966.590 24    0.3739859         1        0.54084
## Saturated    0.000  0 1966.5899596        24        0.00000

From the results, we can conclude that mutual dependence model fits with positive P-value in my view.