Plots

From histograms, it looks like the density of the distribution, varies across levels of hmo and died, with shorter stays for those in HMOs (1) and shorter for those who did die, including what seems to be an inflated number of 1 day stays. So, to examine how stay varies across age groups, we use conditional violin plots which show a kernel density estimate of the distribution of stay mirrored (hence the violin) and conditional on each age group. The distribution of length of stay does not seem to vary much across age groups.

Zero-Truncated Negative Binomial Regression

Interpretation of coefficients:

The value of the coefficient for age, -0.0157 suggests that the log count of stay decreases by 0.0157 for each year increase in age.

. The coefficient for hmo, -0.1471 indicates that the log count of stay for HMO patient is 0.1471 less than for non-HMO patients.

. The log count of stay for patients who died while in the hospital was 0.2178 less than those patients who did not die.

. The value of the constant 2.4083 is the log count of the stay when all of the predictors equal zero.

. The value of the second intercept, the over dispersion parameter, (alpha) is 0.5686.


Call:
vglm(formula = stay ~ age + hmo + died, family = posnegbinomial(), 
    data = dat)

Coefficients: 
              Estimate Std. Error z value Pr(>|z|)    
(Intercept):1  2.40833    0.07158  33.645  < 2e-16 ***
(Intercept):2  0.56864    0.05489  10.359  < 2e-16 ***
age           -0.01569    0.01304  -1.204    0.229    
hmo1          -0.14706    0.05922  -2.483    0.013 *  
died1         -0.21777    0.04615  -4.719 2.38e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Names of linear predictors: loglink(munb), loglink(size)

Log-likelihood: -4755.28 on 2981 degrees of freedom

Number of Fisher scoring iterations: 4 

No Hauck-Donner effect found in any of the estimates

Testing Significance

To test whether we need to estimate over-dispersion, we could fit a zero-truncated Poisson model and compare the two. The change in deviance of 4307 the p-value of 0 for 1 degree of freedom, the overdispersion parameter, let us conclude that the negative binomial model is a better fit to the data.
[1] 4307.039
[1] 0