Plots

As the means of days absent seems to vary with each program, it looks like program type is a good candidate to predict absence. The variances within each level of program are higher than the means. These are conditional measures. This suggests that overdispersion in present and that a negative binomial model is appropriate. In such a case, confidence intervals of a NB regression would likely be wider than those of a Poisson regression. Also, we can see from the plot that the unconditional mean of days of absence (outcome variable) is much lower than its variance (mostly small values).

Negative Binomial Regression

Interpretation of coefficients (reference program: general):

. math: for each unit-increase in math score, the expected log count for number of days absence decreases by 0.006.

. academic program: expected log count difference relative to the general program. In this case, it decreases by 0.44 than the expected log count for general.

. vocational program: expected log count difference relative to the general program. In this case, it decreases by 1.28, lower than the expected log count for general.


Call:
glm.nb(formula = daysabs ~ prog + math, data = dat, init.theta = 1.032713156, 
    link = log)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.1547  -1.0192  -0.3694   0.2285   2.5273  

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)     2.615265   0.197460  13.245  < 2e-16 ***
progAcademic   -0.440760   0.182610  -2.414   0.0158 *  
progVocational -1.278651   0.200720  -6.370 1.89e-10 ***
math           -0.005993   0.002505  -2.392   0.0167 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Negative Binomial(1.0327) family taken to be 1)

    Null deviance: 427.54  on 313  degrees of freedom
Residual deviance: 358.52  on 310  degrees of freedom
AIC: 1741.3

Number of Fisher Scoring iterations: 1


              Theta:  1.033 
          Std. Err.:  0.106 

 2 x log-likelihood:  -1731.258

Testing Significance & Model Assumptions

To determine if program type itself is statistically significant, we can test models with and without the prog variable. Previously, we had the choice of likelyhood ratio tests or deviance tables, but in the case of negative binomial models, a deviance table does not recalculate theta, so the overdispersion parameter is held constant. So, we cannot use a deviance table, we have to fit separate models.

The 2-degree-of-freedom chi-square test indicates that program is statistically significant predictor of daysabs, with a p-value < 0.001. The theta overdispersion parameter is 1.0033.

Likelihood ratio tests of Negative Binomial Models

Response: daysabs
        Model     theta Resid. df    2 x log-lik.   Test    df LR stat.
1        math 0.8558565       312       -1776.306                      
2 prog + math 1.0327132       310       -1731.258 1 vs 2     2 45.04798
      Pr(Chi)
1            
2 1.65179e-10

The Poisson model is actually nested in the NB model, so we can use a LRT to compare them and see whether the assumption that the conditional means are not equal to the conditional variances holds. The LRT is significant with a p-value < 0.001, indicating that the NB model is more appropriate than the Poisson model.

'log Lik.' 2.157298e-203 (df=5)

IRR Incidence Rate Ratios

Confidence intervals

                   Estimate       2.5 %       97.5 %
(Intercept)     2.615265446  2.24205576  3.012935926
progAcademic   -0.440760012 -0.81006586 -0.092643481
progVocational -1.278650721 -1.68348970 -0.890077623
math           -0.005992988 -0.01090086 -0.001066615

Exponentiating model coefficients gives us incidence rate ratios IRR. This is the ratio of the number of events of one category to the number of events in the other category.

Interpretation of coefficients:

. The percent change in incidence rate of days absence is a 1% decrease for every unit increase in math test score.

. The IRR for the academic program is the 64% relative to the general program, and the IRR for the vocational program is 0.28 times the IRR for the general program, holding the other variables constant.

                 Estimate     2.5 %     97.5 %
(Intercept)    13.6708448 9.4126616 20.3470498
progAcademic    0.6435471 0.4448288  0.9115184
progVocational  0.2784127 0.1857247  0.4106239
math            0.9940249 0.9891583  0.9989340

Prediction Plots

Expected count across the range of math scores, for each type of program along with 95 percent confidence intervals. Note that the lines are not straight because this is a log linear model, and what is plotted are the expected values, not the log of the expected values.