Reading the data in:
mortality_dat <- data.frame(Year = as.factor(rep(1961:1967, times = 2)), Gender = gl(n = 2,
k = 7, labels = c("Boys", "Girls")), Deaths = c(48, 55, 55, 58, 58, 49,
43, 36, 40, 44, 45, 46, 51, 39), Births = c(817599, 833269, 852561, 882924,
935366, 705463, 992778, 771773, 785347, 806960, 833837, 888331, 655511,
942869))
mortality_dat2 <- data.frame(mortality_dat, Girls_1966 = mortality_dat$Year ==
1966 & mortality_dat$Gender == "Girls")
poismod <- glm(formula = Deaths ~ offset(log(Births)) + Gender + Year + Girls_1966,
family = poisson, data = mortality_dat2)
We can see that the sample mean equals roughly the variance:
attach(mortality_dat2)
mean(Deaths)
## [1] 47.64
sd(Deaths)^2
## [1] 49.94
However, following a suggestion by Faraway, we should instead compare the fitted values \[ {{\hat \mu }_i} \] with \[ {\left( {{y_i} - {{\hat \mu }_i}} \right)^2} \] to assess underdispersion.
Mean of the fitted values:
mean(fitted(poismod))
## [1] 47.64
Crude approximation for the variance:
mean((Deaths - fitted(poismod))^2)
## [1] 1.732
The difference between the two provides a way better picture of the amount of under-/overdispersion than comparing sample mean and variance:
quasipois.mod <- glm(formula = Deaths ~ offset(log(Births)) + Gender + Year +
Girls_1966, family = quasipoisson, data = mortality_dat2)
summary(quasipois.mod)
##
## Call:
## glm(formula = Deaths ~ offset(log(Births)) + Gender + Year +
## Girls_1966, family = quasipoisson, data = mortality_dat2)
##
## Deviance Residuals:
## 1 2 3 4 5 6 7 8
## 0.1438 0.2450 -0.0456 0.0485 -0.0062 0.0000 -0.4134 -0.1634
## 9 10 11 12 13 14
## -0.2800 0.0513 -0.0547 0.0070 0.0000 0.4542
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.7638 0.0391 -249.89 1.9e-11 ***
## GenderGirls -0.1821 0.0287 -6.35 0.0014 **
## Year1962 0.1048 0.0508 2.06 0.0940 .
## Year1963 0.1212 0.0503 2.41 0.0607 .
## Year1964 0.1268 0.0498 2.55 0.0515 .
## Year1965 0.0763 0.0497 1.54 0.1853
## Year1966 0.1890 0.0622 3.04 0.0288 *
## Year1967 -0.2209 0.0526 -4.20 0.0085 **
## Girls_1966TRUE 0.2955 0.0736 4.01 0.0102 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 0.1148)
##
## Null deviance: 20.52280 on 13 degrees of freedom
## Residual deviance: 0.57309 on 5 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 3
Very strong underdispersion: \[ \phi = 0.11 \]!