Question 1. (a) Fit the loglinear model (DV, DP, PV), and test its goodness of fit.
filename <- "http://users.stat.ufl.edu/~aa/cat/data/DeathPenalty.dat"
library(readr)
Data <- read.table(file=filename, header=T)
DV.DP.PV <- glm(count ~ (D+V+P)^2, data=Data, family=poisson)
DV.DP.PV
##
## Call: glm(formula = count ~ (D + V + P)^2, family = poisson, data = Data)
##
## Coefficients:
## (Intercept) Dwhite Vwhite Pyes Dwhite:Vwhite
## 4.9358 -2.1746 -1.3298 -3.5961 4.5950
## Dwhite:Pyes Vwhite:Pyes
## -0.8678 2.4044
##
## Degrees of Freedom: 7 Total (i.e. Null); 1 Residual
## Null Deviance: 1225
## Residual Deviance: 0.3798 AIC: 52.42
DV.DP.PV1 <- glm(count ~ (D+V+P)^3, data=Data, family=poisson)
anova(DV.DP.PV, DV.DP.PV1, "Chisq")
## Analysis of Deviance Table
##
## Model 1: count ~ (D + V + P)^2
## Model 2: count ~ (D + V + P)^3
## Resid. Df Resid. Dev Df Deviance
## 1 1 0.37984
## 2 0 0.00000 1 0.37984
1-(pchisq(0.3798, 7))
## [1] 0.9997785
P-Value: 0.9997785. Fail to Reject the null that a more complex model is needed.
exp(-0.8678)
## [1] 0.4198743
The conditional odds for DP at each level of V is 0.4198743. Controlling for the victims race, the estimated odds for a white defendant receiving the death penalty is 0.42 times the estimated odds of the black defendant.
(53/430)/(15/176)
## [1] 1.446202
The estimated odds of a white person receiving the death penalty is 1.45 times the odds of a black person person receiving the death penalty.
Simpson’s paradox is made evident as when we control for the victims race, black defendants have a greater odss of the death penalty by prportions of victims’ race, as opposed to when we look at the data marginally.
V <- c("white", "white", "black", "black")
D <- c("white", "black", "white", "black")
Yes <- c(53, 11, 0, 4)
No <- c(414, 37, 16, 139)
Data2 <- data.frame(D, V, Yes, No)
Data2
## D V Yes No
## 1 white white 53 414
## 2 black white 11 37
## 3 white black 0 16
## 4 black black 4 139
D.V <- glm(cbind(Yes,No)~ D+V, data=Data2, family=binomial)
D.V
##
## Call: glm(formula = cbind(Yes, No) ~ D + V, family = binomial, data = Data2)
##
## Coefficients:
## (Intercept) Dwhite Vwhite
## -3.5961 -0.8678 2.4044
##
## Degrees of Freedom: 3 Total (i.e. Null); 1 Residual
## Null Deviance: 22.27
## Residual Deviance: 0.3798 AIC: 19.3
Dwhite & DWhite:Pyes are the same coefficents: -0.8678
Vwhite & Vwhite:Pyes: are the same coefficients: 2.4044
Question 2. At the website www.stat.ufl.edu/~aa/intro-cda/data/for Agresti’s second edition,the MBTI data file cross-classifies the MBTI Step II National Sample on four binary scalesof the Myers-Briggs personality test: Extroversion/Introversion (E/I), Sensing/iNtuitive(S/N), Thinking/Feeling (T/F), and Judging/Perceiving (J/P).
filename2 <- "http://users.stat.ufl.edu/~aa/intro-cda/data/MBTI.dat"
library(readr)
PersonalityData <- read.table(file=filename2, header=T)
fit <- glm(n ~ EI + SN + TF + JP, family = poisson, data = PersonalityData)
fit1 <- glm(n ~ 1, family = poisson, data = PersonalityData)
summary(fit)
##
## Call:
## glm(formula = n ~ EI + SN + TF + JP, family = poisson, data = PersonalityData)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -4.3550 -2.1182 -1.0628 0.8506 5.7457
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.79255 0.07674 49.422 < 2e-16 ***
## EIi 0.26439 0.06226 4.246 2.17e-05 ***
## SNs 0.87008 0.06765 12.861 < 2e-16 ***
## TFt -0.48551 0.06355 -7.640 2.17e-14 ***
## JPp -0.12971 0.06185 -2.097 0.036 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 399.94 on 15 degrees of freedom
## Residual deviance: 135.87 on 11 degrees of freedom
## AIC: 238.7
##
## Number of Fisher Scoring iterations: 4
anova(fit1, fit, "Chisq")
## Analysis of Deviance Table
##
## Model 1: n ~ 1
## Model 2: n ~ EI + SN + TF + JP
## Resid. Df Resid. Dev Df Deviance
## 1 15 399.94
## 2 11 135.87 4 264.08
1-pchisq(135.87, 11)
## [1] 0
The P-value for the goodness of fit test is approximately 0, therfore the model is a good fit.
the estimated conditional association is strongest between the S/N and J/P scales; and
There is not strong evidence of conditional association between the E/I and T/Fscales or between the E/I and J/P scales.
fit2 <- glm(n ~ (EI + SN + TF + JP)^2, family = poisson, data = PersonalityData)
fit3 <- glm(n ~ 1, family = poisson, data = PersonalityData)
summary(fit2)
##
## Call:
## glm(formula = n ~ (EI + SN + TF + JP)^2, family = poisson, data = PersonalityData)
##
## Deviance Residuals:
## 1 2 3 4 5 6 7 8
## -0.72826 1.00215 0.05168 -0.01429 1.49947 -1.29325 -0.07596 0.00231
## 9 10 11 12 13 14 15 16
## 0.56850 -0.82975 -0.04948 0.01728 -1.57051 1.09960 0.08587 -0.00804
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.44760 0.13793 24.994 < 2e-16 ***
## EIi -0.02907 0.15266 -0.190 0.848952
## SNs 1.21082 0.14552 8.320 < 2e-16 ***
## TFt -0.64194 0.16768 -3.828 0.000129 ***
## JPp 0.93417 0.14594 6.401 1.54e-10 ***
## EIi:SNs 0.30212 0.14233 2.123 0.033780 *
## EIi:TFt 0.19449 0.13121 1.482 0.138258
## EIi:JPp 0.01766 0.13160 0.134 0.893261
## SNs:TFt 0.40920 0.15243 2.684 0.007265 **
## SNs:JPp -1.22153 0.14547 -8.397 < 2e-16 ***
## TFt:JPp -0.55936 0.13512 -4.140 3.48e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 399.944 on 15 degrees of freedom
## Residual deviance: 10.162 on 5 degrees of freedom
## AIC: 125
##
## Number of Fisher Scoring iterations: 4
anova(fit3, fit2, "Chisq")
## Analysis of Deviance Table
##
## Model 1: n ~ 1
## Model 2: n ~ (EI + SN + TF + JP)^2
## Resid. Df Resid. Dev Df Deviance
## 1 15 399.94
## 2 5 10.16 10 389.78
1-pchisq(10.162, 5)
## [1] 0.07077304
The P-value is 0.07077304. This model (saturated) is not a better fit than the simpler model.
The conditional association between S/N and J/P are very strong, evident by their extremely low P-values.
E/I and T/F scales have large P-values, which accordingly shows that the conditional association between the scales are insignificant. E/I and J/P also do not have a strong conditional association.
fit4 <- glm(n ~ EI + SN + TF + JP + EI:SN + SN:TF + SN:JP + TF:JP, family = poisson, data = PersonalityData)
summary(fit4)
##
## Call:
## glm(formula = n ~ EI + SN + TF + JP + EI:SN + SN:TF + SN:JP +
## TF:JP, family = poisson, data = PersonalityData)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.65487 -0.46916 0.00529 0.54208 1.47431
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.41362 0.12930 26.402 < 2e-16 ***
## EIi 0.03871 0.11361 0.341 0.733287
## SNs 1.19414 0.14548 8.208 2.24e-16 ***
## TFt -0.54137 0.15282 -3.543 0.000396 ***
## JPp 0.94292 0.13064 7.218 5.28e-13 ***
## EIi:SNs 0.32190 0.13598 2.367 0.017922 *
## SNs:TFt 0.42366 0.15200 2.787 0.005318 **
## SNs:JPp -1.22021 0.14513 -8.408 < 2e-16 ***
## TFt:JPp -0.55853 0.13497 -4.138 3.50e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 399.944 on 15 degrees of freedom
## Residual deviance: 12.369 on 7 degrees of freedom
## AIC: 123.2
##
## Number of Fisher Scoring iterations: 4
anova(fit4,fit2, test="Chisq")
## Analysis of Deviance Table
##
## Model 1: n ~ EI + SN + TF + JP + EI:SN + SN:TF + SN:JP + TF:JP
## Model 2: n ~ (EI + SN + TF + JP)^2
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 7 12.369
## 2 5 10.162 2 2.207 0.3317
P-value is 0.3317, so the model that assumes conditional independence between E/I and T/F, and between E/I and J/P gives adequate fit.
exp(confint(fit4, method="profile"))
## Waiting for profiling to be done...
## 2.5 % 97.5 %
## (Intercept) 23.4067710 38.8667659
## EIi 0.8319067 1.2992469
## SNs 2.4923799 4.4102180
## TFt 0.4299779 0.7832723
## JPp 1.9947113 3.3306215
## EIi:SNs 1.0568719 1.8015497
## SNs:TFt 1.1363105 2.0630723
## SNs:JPp 0.2214587 0.3913279
## TFt:JPp 0.4385878 0.7446507
The confidence interval for the conditional odds ratio between the S/N and J/P scales: is between 0.2214587 and 0.3913279.
The estimated odds ratio associated with the S/N by J/P interaction can be interpreted to mean that the odds of being N given the odds of being J are between 0.2214587 and 0.3913279.
fit <- glm(n ~ EI + SN + TF + JP, family = poisson, data = PersonalityData)
summary(fit)
##
## Call:
## glm(formula = n ~ EI + SN + TF + JP, family = poisson, data = PersonalityData)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -4.3550 -2.1182 -1.0628 0.8506 5.7457
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.79255 0.07674 49.422 < 2e-16 ***
## EIi 0.26439 0.06226 4.246 2.17e-05 ***
## SNs 0.87008 0.06765 12.861 < 2e-16 ***
## TFt -0.48551 0.06355 -7.640 2.17e-14 ***
## JPp -0.12971 0.06185 -2.097 0.036 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 399.94 on 15 degrees of freedom
## Residual deviance: 135.87 on 11 degrees of freedom
## AIC: 238.7
##
## Number of Fisher Scoring iterations: 4
fit2 <- glm(n ~ (EI + SN + TF + JP)^2, family = poisson, data = PersonalityData)
summary(fit2)
##
## Call:
## glm(formula = n ~ (EI + SN + TF + JP)^2, family = poisson, data = PersonalityData)
##
## Deviance Residuals:
## 1 2 3 4 5 6 7 8
## -0.72826 1.00215 0.05168 -0.01429 1.49947 -1.29325 -0.07596 0.00231
## 9 10 11 12 13 14 15 16
## 0.56850 -0.82975 -0.04948 0.01728 -1.57051 1.09960 0.08587 -0.00804
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.44760 0.13793 24.994 < 2e-16 ***
## EIi -0.02907 0.15266 -0.190 0.848952
## SNs 1.21082 0.14552 8.320 < 2e-16 ***
## TFt -0.64194 0.16768 -3.828 0.000129 ***
## JPp 0.93417 0.14594 6.401 1.54e-10 ***
## EIi:SNs 0.30212 0.14233 2.123 0.033780 *
## EIi:TFt 0.19449 0.13121 1.482 0.138258
## EIi:JPp 0.01766 0.13160 0.134 0.893261
## SNs:TFt 0.40920 0.15243 2.684 0.007265 **
## SNs:JPp -1.22153 0.14547 -8.397 < 2e-16 ***
## TFt:JPp -0.55936 0.13512 -4.140 3.48e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 399.944 on 15 degrees of freedom
## Residual deviance: 10.162 on 5 degrees of freedom
## AIC: 125
##
## Number of Fisher Scoring iterations: 4
fit5 <- glm(n ~ (EI + SN + TF + JP)^3, family = poisson, data = PersonalityData)
summary(fit5)
##
## Call:
## glm(formula = n ~ (EI + SN + TF + JP)^3, family = poisson, data = PersonalityData)
##
## Deviance Residuals:
## 1 2 3 4 5 6 7 8
## -0.4805 0.6865 0.4228 -0.4746 0.9580 -0.9412 -0.7382 0.4889
## 9 10 11 12 13 14 15 16
## 0.3666 -0.5798 -0.3618 0.4228 -1.0803 0.7577 0.8099 -0.4746
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.56370 0.16197 22.002 < 2e-16 ***
## EIi -0.27880 0.23337 -1.195 0.2322
## SNs 1.05839 0.18535 5.710 1.13e-08 ***
## TFt -0.63483 0.25356 -2.504 0.0123 *
## JPp 0.76316 0.19243 3.966 7.31e-05 ***
## EIi:SNs 0.61460 0.25451 2.415 0.0157 *
## EIi:TFt 0.20026 0.30833 0.650 0.5160
## EIi:JPp 0.37430 0.26332 1.421 0.1552
## SNs:TFt 0.41081 0.27510 1.493 0.1353
## SNs:JPp -0.96288 0.22994 -4.187 2.82e-05 ***
## TFt:JPp -0.58773 0.29782 -1.973 0.0484 *
## EIi:SNs:TFt -0.02364 0.30704 -0.077 0.9386
## EIi:SNs:JPp -0.51039 0.29275 -1.743 0.0813 .
## EIi:TFt:JPp 0.02440 0.27403 0.089 0.9290
## SNs:TFt:JPp 0.01922 0.30880 0.062 0.9504
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 399.9439 on 15 degrees of freedom
## Residual deviance: 7.0963 on 1 degrees of freedom
## AIC: 129.93
##
## Number of Fisher Scoring iterations: 4
Question 3. True or false? (a) With a single categorical response variable, logistic regression models are more ap-propriate than loglinear models.
Answer: True
Answer: False
Answer: True
Question 4. A recent General Social Survey asked subjects whether they believed in heaven and whether they believed in hell; the following table shows the results.
Heaven_Hell <- matrix(c(833,125,2,160),ncol=2,byrow=TRUE)
colnames(Heaven_Hell) <- c("Hell (Yes)","Hell (No)")
rownames(Heaven_Hell) <- c("Heaven (Yes)","Heaven (No)")
Heaven_Hell <- as.table(Heaven_Hell)
Heaven_Hell
## Hell (Yes) Hell (No)
## Heaven (Yes) 833 125
## Heaven (No) 2 160
Null Hypothesis: the population proportions answering yes were the same for heaven and hell (P_heaven=P_hell) Alternative Hypothesis: the population proportions answering yes were different for heaven and hell
mcnemar.test(Heaven_Hell, correct=F)
##
## McNemar's Chi-squared test
##
## data: Heaven_Hell
## McNemar's chi-squared = 119.13, df = 1, p-value < 2.2e-16
Since p-value < 2.2e-16, which is approximately 0, there is strong evidence against the null hypothesis, therefore the population proportions answering yes were not the same for heaven and hell.
SE <- 1/1120 * sqrt( 125 + 2 - (125-2)^2 / 1120)
SE
## [1] 0.00951184
CI <- ((125 - 2) / 1120) + c(-1,1) * 1.64 * SE
CI
## [1] 0.09422201 0.12542085
The probability of a “Yes” response is between 0.0912 less and 0.1285 more for belief in Heavan than belief in Hell.