Problem 1:
Build a posson glm with log-link predicting weight by a single intercept term and the time:diet interaction. Once complete, transform the natural-log coefficients back to growth rates by taking (exp(beta)-1), which may be more interpretable (for the intercept, don’t subtract 1). Discuss what each coefficient means. Why are the different diet coefficients similar? *Determine which diet is best in terms of growth rate.
Solution:
The Diet 1 coefficient means that it changes 3.79 units as weight increases by 1 unit. The Diet 2 coefficient means that it changes 0.06 units as weight increases by 1 unit. The Diet 3 coefficient means that it changes 0.05 units as weight increases by 1 unit. The Diet 4 coefficient means that it changes 0.16 units as weight increases by 1 unit.
The different diet coefficients similar since the log link function transforms the values to log and hence they seem similar.
In terms of growth rate, Diet 1 is the best.
Code:
data<-data("ChickWeight")
model1<-glm(weight~Diet+Time:Diet,data=ChickWeight,family=poisson(link="log"))
summary(model1)
##
## Call:
## glm(formula = weight ~ Diet + Time:Diet, family = poisson(link = "log"),
## data = ChickWeight)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -12.0283 -1.1980 -0.3071 1.1689 8.3756
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.798814 0.015649 242.749 < 2e-16 ***
## Diet2 0.068587 0.025837 2.655 0.00794 **
## Diet3 0.050636 0.025395 1.994 0.04615 *
## Diet4 0.161604 0.025111 6.436 1.23e-10 ***
## Diet1:Time 0.069223 0.001048 66.025 < 2e-16 ***
## Diet2:Time 0.074873 0.001332 56.215 < 2e-16 ***
## Diet3:Time 0.086803 0.001267 68.491 < 2e-16 ***
## Diet4:Time 0.076232 0.001284 59.352 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 22444.4 on 577 degrees of freedom
## Residual deviance: 3996.3 on 570 degrees of freedom
## AIC: 7755.1
##
## Number of Fisher Scoring iterations: 4
beta<-model1$coefficients
newcoef<-cbind(exp(beta[2]-1),exp(beta[3]-1),exp(beta[4]-1),exp(beta[5]-1),exp(beta[6]-1),exp(beta[7]-1),exp(beta[8]-1))
transformed<-cbind(beta[1],newcoef)
transformed
## [,1] [,2] [,3] [,4] [,5] [,6]
## (Intercept) 3.798814 0.3939964 0.3869872 0.4324036 0.3942473 0.3964809
## [,7] [,8]
## (Intercept) 0.4012396 0.3970201
Problem 2:
Build a second model allowing each diet to have its own intercept term. Compare that model to the model in part 1, using AIC scores and the anova() method using a test=“Chisq” option. Using the results of the model, determine whether we might have had sampled chicks unfairly, so that their starting weights differed significantly across diet. Describe in words how you concluded this.
Solution:
The model shows that every Diet is significant, and hence they are not unfarily sampled as the Residual Deviance is also shown as 1. Although, we see that there large is a difference in Deviance and degrees of freedom. We can conclude that the samples are not unfair, but there is over dispersion.
Code:
model2<-glm(weight~Diet,data=ChickWeight,family=poisson(link="log"))
summary(model2)
##
## Call:
## glm(formula = weight ~ Diet, family = poisson(link = "log"),
## data = ChickWeight)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -10.324 -5.508 -1.307 3.552 16.112
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 4.631281 0.006655 695.96 <2e-16 ***
## Diet2 0.177782 0.010595 16.78 <2e-16 ***
## Diet3 0.331214 0.010128 32.70 <2e-16 ***
## Diet4 0.275938 0.010341 26.68 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 22444 on 577 degrees of freedom
## Residual deviance: 21157 on 574 degrees of freedom
## AIC: 24907
##
## Number of Fisher Scoring iterations: 5
AIC(model1)
## [1] 7755.093
AIC(model2)
## [1] 24907.4
anova(model1, model2,test="Chisq")
Problem 3:
The large residual deviance suggests we might have over-dispersion. That is, the variability is greater than would be expected via independent samples with the possion model. This could occur if different chicks differ systematically–something we probably know is true. One way to assess this is to use the quasipoisson model family (again with log link). Run your two models again with the quasipoisson model and look at the estimate of overdispersion. If it is substantially different from 1.0, this indicates we have overdispersion. Describe what you found. Compare these models using an anova() with test=“Chisq” to determine which model is better when overdispersion is accounted for. Try to explain why the results differ here from the first comparison?
Solution:
The residual deviance is highly different that degrees of freedom, it is also showing that the dispersion parameter is not near 1. Hence we can conclude that there is over dispersion.
The results differ from the first model since the assumption for poisson is varience equals expected value and for quasipoison, the variance is linearly dependent to the expected value. Hence the quasipoisson model is better.
Model3 is better in this case, where the Diet and Time interaction is considered.
Code:
model3<-glm(weight~Diet+Time:Diet,data=ChickWeight,quasipoisson(link = "log"))
summary(model3)
##
## Call:
## glm(formula = weight ~ Diet + Time:Diet, family = quasipoisson(link = "log"),
## data = ChickWeight)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -12.0283 -1.1980 -0.3071 1.1689 8.3756
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.798814 0.041284 92.016 <2e-16 ***
## Diet2 0.068587 0.068161 1.006 0.315
## Diet3 0.050636 0.066994 0.756 0.450
## Diet4 0.161604 0.066246 2.439 0.015 *
## Diet1:Time 0.069223 0.002766 25.027 <2e-16 ***
## Diet2:Time 0.074873 0.003514 21.309 <2e-16 ***
## Diet3:Time 0.086803 0.003343 25.962 <2e-16 ***
## Diet4:Time 0.076232 0.003388 22.498 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 6.95968)
##
## Null deviance: 22444.4 on 577 degrees of freedom
## Residual deviance: 3996.3 on 570 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 4
model4<-glm(weight~Diet,data=ChickWeight,quasipoisson(link = "log"))
summary(model4)
##
## Call:
## glm(formula = weight ~ Diet, family = quasipoisson(link = "log"),
## data = ChickWeight)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -10.324 -5.508 -1.307 3.552 16.112
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.63128 0.04135 112.015 < 2e-16 ***
## Diet2 0.17778 0.06582 2.701 0.00712 **
## Diet3 0.33121 0.06293 5.264 2.00e-07 ***
## Diet4 0.27594 0.06425 4.295 2.05e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 38.60189)
##
## Null deviance: 22444 on 577 degrees of freedom
## Residual deviance: 21157 on 574 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 5
anova(model3, model4,test="Chisq")