II. Poisson Regression (MB_score ~ REE + Gender)
1.Regression
summary(m1<-glm(MB_score~REE+Gender,family="poisson",data=data))
##
## Call:
## glm(formula = MB_score ~ REE + Gender, family = "poisson", data = data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -7.869 -4.199 -2.227 1.243 17.330
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 4.380e+00 1.163e-01 37.653 <2e-16 ***
## REE -9.115e-04 6.496e-05 -14.031 <2e-16 ***
## Gender2 -9.827e-02 4.232e-02 -2.322 0.0202 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 5331.9 on 218 degrees of freedom
## Residual deviance: 5034.4 on 216 degrees of freedom
## AIC: 5852.1
##
## Number of Fisher Scoring iterations: 5
2. Test for over-dispersion
##by AER package (chi dung cho Poisson GLMs)
dispersiontest(m1,trafo=1)
##
## Overdispersion test
##
## data: m1
## z = 5.2157, p-value = 9.157e-08
## alternative hypothesis: true alpha is greater than 0
## sample estimates:
## alpha
## 26.8795
#Comments: p significant--> dispersed
3. Assumption check (plot the estimated variance against the mean)
plot(log(fitted(m1)),log((data$MB_score-fitted(m1))^2),
xlab=expression(hat(mu)),ylab=expression((y-hat(mu))^2),ylim=c(-5,10),pch=16,col="grey")
abline(0,1)
text(2.6,-3,"Variance (y) is not propotional" )
text(2.6,-4, "but larger than mean (x)")

4. Check goodness-of-fit (xem model co fit voi data khong)
deviance(m1)
## [1] 5034.371
df.residual(m1)
## [1] 216
# deviance va df.residual cach xa nhau quas--> khong fit
# Thu so sanh hai gia tri nay bang Chi-square test
with(m1, cbind(res.deviance = deviance, df = df.residual,
p = pchisq(deviance, df.residual, lower.tail=FALSE)))
## res.deviance df p
## [1,] 5034.371 216 0
# Comments: p significant ---> rat khac biet
—>>> Posisson regression is definitely not appropriate <<<—
III. Negative regression
1.Regression
summary(m2 <- glm.nb(MB_score ~ REE + Gender, data = data))
##
## Call:
## glm.nb(formula = MB_score ~ REE + Gender, data = data, init.theta = 0.6179190548,
## link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.2189 -0.9848 -0.4579 0.2409 2.4114
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 4.4008470 0.6340909 6.940 3.91e-12 ***
## REE -0.0009492 0.0003399 -2.792 0.00523 **
## Gender2 -0.0330037 0.2364783 -0.140 0.88901
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(0.6179) family taken to be 1)
##
## Null deviance: 271.04 on 218 degrees of freedom
## Residual deviance: 259.76 on 216 degrees of freedom
## AIC: 1694.7
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 0.6179
## Std. Err.: 0.0606
##
## 2 x log-likelihood: -1686.7350
2. Assumption check
Assumption Negative Binomial Regression la do phan tan cua du lieu khong dong nhat (khac voi Poisson regression)
Negative binomial models assume the conditional means are not equal to the conditional variances.
This inequality is captured by estimating a dispersion parameter (not shown in the output) that is held constant in a Poisson model. Thus, the Poisson model is actually nested in the negative binomial model. We can then use a likelihood ratio test to compare these two and test this model assumption. To do this, we will run our model as a Poisson.
Loglikelihood_dif<-2*(logLik(m2)-logLik(m1))
Loglikelihood_dif
## 'log Lik.' 4159.337 (df=4)
pchisq(Loglikelihood_dif,df=1,lower.tail = F) #Tinh p value
## 'log Lik.' 0 (df=4)
# Comment: This very large chi-square (very low p value) strongly suggests the negative binomialmodel, which estimates the dispersion parameter, is more appropriate than the Poisson model
3. Check goodness-of-fit (xem model co fit voi data khong)
deviance(m2)
## [1] 259.7574
df.residual(m2)
## [1] 216
# deviance va df.residual gan nhu nhau--> Co the fit
# Thu so sanh hai gia tri nay bang Chi-square test
with(m2, cbind(res.deviance = deviance, df = df.residual,
p = pchisq(deviance, df.residual, lower.tail=FALSE)))
## res.deviance df p
## [1,] 259.7574 216 0.02224544
# Comments: p significant ---> khac biet rat nho
4. Tinh Incident Rate ratio
(est <- cbind(Estimate = coef(m2), confint(m2)))
## Waiting for profiling to be done...
## Estimate 2.5 % 97.5 %
## (Intercept) 4.4008470180 3.236266598 5.6025507155
## REE -0.0009491715 -0.001596397 -0.0003007758
## Gender2 -0.0330036864 -0.464153232 0.3880722325
exp(est) # Tinh IRR
## Estimate 2.5 % 97.5 %
## (Intercept) 81.5198882 25.4385718 271.1170687
## REE 0.9990513 0.9984049 0.9996993
## Gender2 0.9675350 0.6286672 1.4741363
# Comments 1: For each unit increase in REE, the IRR of MB_score decrease as 0.1%
# Comments 2: No gender effect in predicting MB_score
5. Tinh Predictive Probability
newdata2 <- data.frame(
REE = rep(seq(from = min(data$REE), to = max(data$REE), length.out = 100), 2),
Gender = factor(rep(1:2, each = 100), levels = 1:2, labels =levels(data$Gender)))
newdata2 <- cbind(newdata2, predict(m2, newdata2, type = "link", se.fit=TRUE))
newdata2 <- within(newdata2, {
MB_score <- exp(fit)
LL <- exp(fit - 1.96 * se.fit)
UL <- exp(fit + 1.96 * se.fit)
}) # Tinh predictive probability
ggplot(newdata2, aes(REE, MB_score)) +
geom_ribbon(aes(ymin = LL, ymax = UL, fill = Gender), alpha = .25) +
geom_line(aes(colour = Gender), size = 2) +
labs(x = "REE", y = "Predicted MB_score") # Ve PP bang ggplot2

# Comments 1: Higher REE is related with lower MB_score
# Comments 2:No gender effect
IV. Things to consider
1.Is there a more appropriate model using REE to predict MB_score?
2.If MB_score is higher when REE is high or low, and is low with medium REE, what model should be tested?