Exercise 9.2 Consider the data set DaytonSurvey (described in Example 2.6), giving results of a survey of use of alcohol (A), cigarettes (C), and marijuana (M) among high school seniors. For this exercise, ignore the variables sex and race, by working with the marginal table Dayton.ACM, a 2 × 2 × 2 table in frequency data frame form.
library(vcdExtra)
## Warning: package 'vcdExtra' was built under R version 3.4.4
## Loading required package: vcd
## Warning: package 'vcd' was built under R version 3.4.4
## Loading required package: grid
## Loading required package: gnm
## Warning: package 'gnm' was built under R version 3.4.4
data("DaytonSurvey")
Dayton.ACM <- aggregate(Freq ~ cigarette + alcohol + marijuana, data = DaytonSurvey, FUN = sum)
Dayton.ACM
## cigarette alcohol marijuana Freq
## 1 Yes Yes Yes 911
## 2 No Yes Yes 44
## 3 Yes No Yes 3
## 4 No No Yes 2
## 5 Yes Yes No 538
## 6 No Yes No 456
## 7 Yes No No 43
## 8 No No No 279
The null hypothesis for joint independence model at 0.05 significance level
Null Hypothesis: The use of marijuana (M) is independent on cigarette(C) & alcohol(A) Alternative Hypothesis : The use of marijuana (M) is dependent on cigarette(C) & alcohol(A)
JAMC<-ftable(Dayton.ACM, row.vars=c(2,1,3))
Jresult<-chisq.test(JAMC)
## Warning in chisq.test(JAMC): Chi-squared approximation may be incorrect
Jresult
##
## Pearson's Chi-squared test
##
## data: JAMC
## X-squared = 56, df = 49, p-value = 0.2289
From the chi-squared test, the p-value (0.2289) is greater than 0.05, hence we do not reject the null hypothesis
Let’s now test the hypothesis that A and C are conditionally independent given M. To do this, we enter the data for each 2 × 2 table of A × C corresponding to different levels of, M = 0 (No), M = 1(Yes), respectively, then perform independence tests on these tables, and add up the X^2 statistics
library(stats)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.4
library(ca)
## Warning: package 'ca' was built under R version 3.4.4
library(logmult)
## Warning: package 'logmult' was built under R version 3.4.4
##
## Attaching package: 'logmult'
## The following object is masked from 'package:gnm':
##
## se
## The following object is masked from 'package:vcd':
##
## assoc
library(MASS)
## Warning: package 'MASS' was built under R version 3.4.4
JindAMC<- aggregate(Freq ~ alcohol*cigarette + marijuana, data = DaytonSurvey, FUN = sum)
JindAMC
## alcohol cigarette marijuana Freq
## 1 Yes Yes Yes 911
## 2 No Yes Yes 3
## 3 Yes No Yes 44
## 4 No No Yes 2
## 5 Yes Yes No 538
## 6 No Yes No 43
## 7 Yes No No 456
## 8 No No No 279
CondAMC<-aggregate(Freq ~ marijuana*(cigarette + alcohol), data = DaytonSurvey, FUN = sum)
CondAMC
## marijuana cigarette alcohol Freq
## 1 Yes Yes Yes 911
## 2 No Yes Yes 538
## 3 Yes No Yes 44
## 4 No No Yes 456
## 5 Yes Yes No 3
## 6 No Yes No 43
## 7 Yes No No 2
## 8 No No No 279
HomAMC<-aggregate(Freq ~ (alcohol + cigarette + marijuana)^2, data = DaytonSurvey, FUN = sum)
HomAMC
## alcohol cigarette marijuana Freq
## 1 Yes Yes Yes 911
## 2 No Yes Yes 3
## 3 Yes No Yes 44
## 4 No No Yes 2
## 5 Yes Yes No 538
## 6 No Yes No 43
## 7 Yes No No 456
## 8 No No No 279
We will use goodness-of-fit fests for all the three models and apply anova() function to find the best fit among them
Jindfit<-loglm(Freq ~ alcohol*cigarette + marijuana, data = DaytonSurvey, param = TRUE, fitted = TRUE)
Jindfit
## Call:
## loglm(formula = Freq ~ alcohol * cigarette + marijuana, data = DaytonSurvey,
## param = TRUE, fitted = TRUE)
##
## Statistics:
## X^2 df P(> X^2)
## Likelihood Ratio 2810.417 27 0
## Pearson 2869.945 27 0
Condfit<-loglm(Freq ~ marijuana*(cigarette + alcohol), data = DaytonSurvey, param = TRUE, fitted = TRUE)
Condfit
## Call:
## loglm(formula = Freq ~ marijuana * (cigarette + alcohol), data = DaytonSurvey,
## param = TRUE, fitted = TRUE)
##
## Statistics:
## X^2 df P(> X^2)
## Likelihood Ratio 2154.344 26 0
## Pearson 1974.994 26 0
Homfit<-loglm(Freq ~ (alcohol + cigarette + marijuana)^2, data = DaytonSurvey, param = TRUE, fitted = TRUE)
Homfit
## Call:
## loglm(formula = Freq ~ (alcohol + cigarette + marijuana)^2, data = DaytonSurvey,
## param = TRUE, fitted = TRUE)
##
## Statistics:
## X^2 df P(> X^2)
## Likelihood Ratio 1966.964 25 0
## Pearson 1667.735 25 0
Mutfit<-loglm(Freq ~ alcohol + cigarette + marijuana, data = DaytonSurvey, param = TRUE, fitted = TRUE)
Mutfit
## Call:
## loglm(formula = Freq ~ alcohol + cigarette + marijuana, data = DaytonSurvey,
## param = TRUE, fitted = TRUE)
##
## Statistics:
## X^2 df P(> X^2)
## Likelihood Ratio 3252.610 28 0
## Pearson 4055.973 28 0
Satfit<-loglm(Freq ~ alcohol * cigarette * marijuana, data = DaytonSurvey, param = TRUE, fitted = TRUE)
Satfit
## Call:
## loglm(formula = Freq ~ alcohol * cigarette * marijuana, data = DaytonSurvey,
## param = TRUE, fitted = TRUE)
##
## Statistics:
## X^2 df P(> X^2)
## Likelihood Ratio 1966.590 24 0
## Pearson 1666.879 24 0
We will apply anova to these five models to find the reasonable fit
anova(Jindfit,Condfit,Homfit,Mutfit,Satfit, test = "Chisq")
## LR tests for hierarchical log-linear models
##
## Model 1:
## Freq ~ alcohol + cigarette + marijuana
## Model 2:
## Freq ~ alcohol * cigarette + marijuana
## Model 3:
## Freq ~ marijuana * (cigarette + alcohol)
## Model 4:
## Freq ~ (alcohol + cigarette + marijuana)^2
## Model 5:
## Freq ~ alcohol * cigarette * marijuana
##
## Deviance df Delta(Dev) Delta(df) P(> Delta(Dev)
## Model 1 3252.610 28
## Model 2 2810.417 27 442.1933108 1 0.00000
## Model 3 2154.344 26 656.0723408 1 0.00000
## Model 4 1966.964 25 187.3803170 1 0.00000
## Model 5 1966.590 24 0.3739859 1 0.54084
## Saturated 0.000 0 1966.5899596 24 0.00000
From the results, we can conclude that mutual dependence model fits with positive P-value in my view.