Problem Set 5

Problem 1:

Build a posson glm with log-link predicting weight by a single intercept term and the time:diet interaction. Once complete, transform the natural-log coefficients back to growth rates by taking (exp(beta)-1), which may be more interpretable (for the intercept, don’t subtract 1). Discuss what each coefficient means. Why are the different diet coefficients similar? *Determine which diet is best in terms of growth rate.

Solution:

The Diet 1 coefficient means that it changes 3.79 units as weight increases by 1 unit. The Diet 2 coefficient means that it changes 0.06 units as weight increases by 1 unit. The Diet 3 coefficient means that it changes 0.05 units as weight increases by 1 unit. The Diet 4 coefficient means that it changes 0.16 units as weight increases by 1 unit.

The different diet coefficients similar since the log link function transforms the values to log and hence they seem similar.

In terms of growth rate, Diet 1 is the best.

Code:

data<-data("ChickWeight")
model1<-glm(weight~Diet+Time:Diet,data=ChickWeight,family=poisson(link="log"))
summary(model1)

## 
## Call:
## glm(formula = weight ~ Diet + Time:Diet, family = poisson(link = "log"), 
##     data = ChickWeight)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -12.0283   -1.1980   -0.3071    1.1689    8.3756  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) 3.798814   0.015649 242.749  < 2e-16 ***
## Diet2       0.068587   0.025837   2.655  0.00794 ** 
## Diet3       0.050636   0.025395   1.994  0.04615 *  
## Diet4       0.161604   0.025111   6.436 1.23e-10 ***
## Diet1:Time  0.069223   0.001048  66.025  < 2e-16 ***
## Diet2:Time  0.074873   0.001332  56.215  < 2e-16 ***
## Diet3:Time  0.086803   0.001267  68.491  < 2e-16 ***
## Diet4:Time  0.076232   0.001284  59.352  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 22444.4  on 577  degrees of freedom
## Residual deviance:  3996.3  on 570  degrees of freedom
## AIC: 7755.1
## 
## Number of Fisher Scoring iterations: 4

beta<-model1$coefficients
newcoef<-cbind(exp(beta[2]-1),exp(beta[3]-1),exp(beta[4]-1),exp(beta[5]-1),exp(beta[6]-1),exp(beta[7]-1),exp(beta[8]-1))
transformed<-cbind(beta[1],newcoef)
transformed

##                 [,1]      [,2]      [,3]      [,4]      [,5]      [,6]
## (Intercept) 3.798814 0.3939964 0.3869872 0.4324036 0.3942473 0.3964809
##                  [,7]      [,8]
## (Intercept) 0.4012396 0.3970201

Problem 2:

Build a second model allowing each diet to have its own intercept term. Compare that model to the model in part 1, using AIC scores and the anova() method using a test=“Chisq” option. Using the results of the model, determine whether we might have had sampled chicks unfairly, so that their starting weights differed significantly across diet. Describe in words how you concluded this.

Solution:

The model shows that every Diet is significant, and hence they are not unfarily sampled as the Residual Deviance is also shown as 1. Although, we see that there large is a difference in Deviance and degrees of freedom. We can conclude that the samples are not unfair, but there is over dispersion.

Code:

model2<-glm(weight~Diet,data=ChickWeight,family=poisson(link="log"))
summary(model2)

## 
## Call:
## glm(formula = weight ~ Diet, family = poisson(link = "log"), 
##     data = ChickWeight)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -10.324   -5.508   -1.307    3.552   16.112  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) 4.631281   0.006655  695.96   <2e-16 ***
## Diet2       0.177782   0.010595   16.78   <2e-16 ***
## Diet3       0.331214   0.010128   32.70   <2e-16 ***
## Diet4       0.275938   0.010341   26.68   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 22444  on 577  degrees of freedom
## Residual deviance: 21157  on 574  degrees of freedom
## AIC: 24907
## 
## Number of Fisher Scoring iterations: 5

AIC(model1)

## [1] 7755.093

AIC(model2)

## [1] 24907.4

anova(model1, model2,test="Chisq")

Problem 3:

The large residual deviance suggests we might have over-dispersion. That is, the variability is greater than would be expected via independent samples with the possion model. This could occur if different chicks differ systematically–something we probably know is true. One way to assess this is to use the quasipoisson model family (again with log link). Run your two models again with the quasipoisson model and look at the estimate of overdispersion. If it is substantially different from 1.0, this indicates we have overdispersion. Describe what you found. Compare these models using an anova() with test=“Chisq” to determine which model is better when overdispersion is accounted for. Try to explain why the results differ here from the first comparison?

Solution:

The residual deviance is highly different that degrees of freedom, it is also showing that the dispersion parameter is not near 1. Hence we can conclude that there is over dispersion.

The results differ from the first model since the assumption for poisson is varience equals expected value and for quasipoison, the variance is linearly dependent to the expected value. Hence the quasipoisson model is better.

Model3 is better in this case, where the Diet and Time interaction is considered.

Code:

model3<-glm(weight~Diet+Time:Diet,data=ChickWeight,quasipoisson(link = "log"))
summary(model3)

## 
## Call:
## glm(formula = weight ~ Diet + Time:Diet, family = quasipoisson(link = "log"), 
##     data = ChickWeight)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -12.0283   -1.1980   -0.3071    1.1689    8.3756  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3.798814   0.041284  92.016   <2e-16 ***
## Diet2       0.068587   0.068161   1.006    0.315    
## Diet3       0.050636   0.066994   0.756    0.450    
## Diet4       0.161604   0.066246   2.439    0.015 *  
## Diet1:Time  0.069223   0.002766  25.027   <2e-16 ***
## Diet2:Time  0.074873   0.003514  21.309   <2e-16 ***
## Diet3:Time  0.086803   0.003343  25.962   <2e-16 ***
## Diet4:Time  0.076232   0.003388  22.498   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 6.95968)
## 
##     Null deviance: 22444.4  on 577  degrees of freedom
## Residual deviance:  3996.3  on 570  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 4

model4<-glm(weight~Diet,data=ChickWeight,quasipoisson(link = "log"))
summary(model4)

## 
## Call:
## glm(formula = weight ~ Diet, family = quasipoisson(link = "log"), 
##     data = ChickWeight)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -10.324   -5.508   -1.307    3.552   16.112  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.63128    0.04135 112.015  < 2e-16 ***
## Diet2        0.17778    0.06582   2.701  0.00712 ** 
## Diet3        0.33121    0.06293   5.264 2.00e-07 ***
## Diet4        0.27594    0.06425   4.295 2.05e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 38.60189)
## 
##     Null deviance: 22444  on 577  degrees of freedom
## Residual deviance: 21157  on 574  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 5

anova(model3, model4,test="Chisq")

Problem Set 5

Sneha Nimmagadda

February 22, 2019