Count Data - Poisson Deviance - Sqrt

In a study of ‘Swissmeteo’, pollen counts of different species (trees, grasses, etc) have been measured… albeit with some percentage (say 2–3%) of missing data. In our “Statistics Lab” at ETH, we were looking into imputation of such counts. In order to judge “goodness of fit” of models or imputations for such counts, we will want to summarize differences between observed counts \(N_i\) and a fitted (e.g., imputed) counts \(\hat{N_i}\).

Poisson Regression

Poisson regression, i.e., a generalized linear model with \(\log(.)\) link and (in R) family = poisson() makes sense as a first approximation to modeling counts, given other station counts, weather conditions, previous years experience etc. A more sophisticated model would use “zero-inflated” (ZI) poisson or also ZI negative binomal models. The usual poisson regression model for counts \(N_i\) may be written as \[ \log(E[N_i]) = {\beta}^T \mathbf{x_i}, \] or more precisely, \(N_i \sim {\cal P}(\lambda_i)\) with \(\log(\lambda_i) = {\beta}^T \mathbf{x_i}\). In R, one would use

fm <- glm(N ~ ., data, family = poisson())

where family is often specified as character string ("family") instead of function call. However that emphasizes that these are functions creating family objects in R:

poisF <- poisson() ; poisF

Family: poisson 
Link function: log 
## inspecting it a bit more:
str(poisF)
List of 12
 $ family    : chr "poisson"
 $ link      : chr "log"
 $ linkfun   :function (mu)  
 $ linkinv   :function (eta)  
 $ variance  :function (mu)  
 $ dev.resids:function (y, mu, wt)  
 $ aic       :function (y, n, mu, wt, dev)  
 $ mu.eta    :function (eta)  
 $ initialize:  expression({     if (any(y < 0))          stop("negative values not allowed for the 'Poisson' family")     n <- rep.int(1, nobs)     mustart <- y + 0.1 })
 $ validmu   :function (mu)  
 $ valideta  :function (eta)  
 $ simulate  :function (object, nsim)  
 - attr(*, "class")= chr "family"

and we see that such R family objects contain a dev.resids function for computing “deviance residuals”, and we can use indeed to compare count discrepancies.

resid.d <- poisF$dev.resids; resid.d   ## 'wt' allows **weights** to be specified
function (y, mu, wt) 
{
    r <- mu * wt
    p <- which(y > 0)
    r[p] <- (wt * (y * log(y/mu) - (y - mu)))[p]
    2 * r
}

which defines (for constant weights, say \(1\)), the deviance function, using \(\hat Y\) here, instead of \(\mu\), \[ D(Y, \hat Y) = 2(Y \log(Y/{\hat Y}) - (Y - \hat Y)), \] where (for \(Y = 0\), the first term is set to zero (in short “\(0 \cdot \log(0) := 0\)”). Note that \(D(x,y)\) is not symmetric in \(x\) and \(y\), we have \(Y \ge 0\) and in theory, \(\mu > 0\), but in the above situation \(\hat Y \ge 0\), i.e., \(\hat Y = 0\) is possible and leads to \(D(Y, 0) = \infty\) (apart from \(D(0,0) = 0\)).

resid.d(1, mu = 0, wt = 1) # Y.hat = mu = 0
[1] Inf
Y <- c(0:3, 20:23)
Y.h <- Y + 1 # artificially assume we fitted '+1' every where
print( resid.d(Y, Y.h, wt = 1),  digits = 3)
[1] 2.0000 0.6137 0.3781 0.2739 0.0484 0.0462 0.0441 0.0423

so, a “miss count” of one is more severe for small counts than for large ones, as it should be.

First Aid Transformation for Counts: \(\sqrt{.}\) (sqrt := “square root”)

Applied statisticians for a long time have used square root transformed counts instead of counts, as the \(\sqrt{\cdot}\) is the “first aid” transformation for counts (John W. Tukey), because it is the variance stabilizing transformation for the poisson distribution. Also, for our application, larger scale imputation methods are often geared towards approximately normal data, sometimes even only indirectly by using least squares loss.

From this perspective, it is measure losses “normally”, i.e., squared deviation on the transformed scale, i.e., \[ S(Y, \hat Y) = 4 (\sqrt{Y} - \sqrt{{\hat Y}}) ^2, \] where the factor \(4\) is arbitrary for now.

Now, I found that indeed, the squared \(\sqrt\)-difference \(S()\) and the deviance residual \(D()\) are very similar numerically notably in the realistic situation that \(Y\) and \(\hat Y\) differ not by more than say two standard deviations, i.e., approximately \(2 \sqrt{\hat Y}\), e.g., first for the case above,

## Difference is '1' but residuals are *not* the same:
rbind(Y, resid.d = resid.d(Y, Y.h, wt=1),
         sqrt.d  = 4 *(sqrt(Y) - sqrt(Y.h))^2)
        [,1]    [,2]    [,3]    [,4]      [,5]      [,6]      [,7]
Y          0 1.00000 2.00000 3.00000 20.000000 21.000000 22.000000
resid.d    2 0.61371 0.37814 0.27391  0.048393  0.046159  0.044122
sqrt.d     4 0.68629 0.40408 0.28719  0.048788  0.046518  0.044450
             [,8]
Y       23.000000
resid.d  0.042258
sqrt.d   0.042558

and indeed, the number are quite close. Here a visual “proof”:

p.pois.loss <- function(Y, Y.hat, type = "b", log = "xy", ...) {
    stopifnot(length(Y) == length(Y.hat), Y >= 0, Y.hat >= 0, Y == as.integer(Y))
    dMat <- cbind(resid.d = poisson()$dev.resids(y = Y, mu = Y.hat, wt=1),
                  sqrt.d  = abs(sqrt(Y) - sqrt(Y.hat)),
                  sqrt.d2 = (sqrt(Y) - sqrt(Y.hat))^2)
    matplot(Y, dMat, type=type, log=log,
            ylab = "Poisson D(.,.)", main = "Poisson Losses", ...)
    legend("topright", expression("deviance resid.",
                                  abs(sqrt(Y) -sqrt(hat(Y))),
                                  (sqrt(Y) - sqrt(hat(Y)))^2), lty=1:3, col=1:3, bty="n")
    mtext(deparse(match.call()), side=3)
    invisible(cbind(Y=Y, Y.hat = Y.hat, dMat))
}

Y <- c(0:3, 10:12, 20, 30,50, 100, 200, 500, 1000, 10000)
(d.1 <- p.pois.loss(Y, Y+1))
Warning in xy.coords(x, y, xlabel, ylabel, log = log): 3 x values <= 0
omitted from logarithmic plot
Warning in xy.coords(x, y, xlabel, ylabel, log): 1 x value <= 0
omitted from logarithmic plot

          Y Y.hat    resid.d    sqrt.d    sqrt.d2
 [1,]     0     1 2.0000e+00 1.0000000 1.0000e+00
 [2,]     1     2 6.1371e-01 0.4142136 1.7157e-01
 [3,]     2     3 3.7814e-01 0.3178372 1.0102e-01
 [4,]     3     4 2.7391e-01 0.2679492 7.1797e-02
 [5,]    10    11 9.3796e-02 0.1543471 2.3823e-02
 [6,]    11    12 8.5750e-02 0.1474768 2.1749e-02
 [7,]    12    13 7.8975e-02 0.1414497 2.0008e-02
 [8,]    20    21 4.8393e-02 0.1104397 1.2197e-02
 [9,]    30    31 3.2611e-02 0.0905388 8.1973e-03
[10,]    50    51 1.9737e-02 0.0703606 4.9506e-03
[11,]   100   101 9.9338e-03 0.0498756 2.4876e-03
[12,]   200   201 4.9834e-03 0.0353113 1.2469e-03
[13,]   500   501 1.9973e-03 0.0223495 4.9950e-04
[14,]  1000  1001 9.9933e-04 0.0158074 2.4988e-04
[15,] 10000 10001 9.9993e-05 0.0049999 2.4999e-05
(d10 <- p.pois.loss(Y + 10, Y))

          Y Y.hat    resid.d   sqrt.d    sqrt.d2
 [1,]    10     0        Inf 3.162278 10.0000000
 [2,]    11     1 32.7536960 2.316625  5.3667504
 [3,]    12     2 23.0022273 2.049888  4.2020410
 [4,]    13     3 18.1247638 1.873500  3.5100040
 [5,]    20    10  7.7258872 1.309858  1.7157288
 [6,]    21    11  7.1583409 1.265951  1.6026317
 [7,]    22    12  6.6699754 1.226314  1.5038464
 [8,]    30    20  4.3279065 1.005090  1.0102051
 [9,]    40    30  3.0145658 0.847330  0.7179677
[10,]    60    50  1.8785868 0.674899  0.4554885
[11,]   110   100  0.9682396 0.488088  0.2382304
[12,]   210   200  0.4918690 0.349241  0.1219694
[13,]   510   500  0.1986798 0.222500  0.0495062
[14,]  1010  1000  0.0996683 0.157721  0.0248758
[15,] 10010 10000  0.0099967 0.049988  0.0024988
(dd <- p.pois.loss(Y, pmax(0, Y + c(-2:-1, 1:2))))
Warning in Y + c(-2:-1, 1:2): longer object length is not a multiple
of shorter object length
Warning in xy.coords(x, y, xlabel, ylabel, log = log): 3 x values <= 0
omitted from logarithmic plot
Warning in xy.coords(x, y, xlabel, ylabel, log = log): 3 y values <= 0
omitted from logarithmic plot
Warning in xy.coords(x, y, xlabel, ylabel, log): 1 x value <= 0
omitted from logarithmic plot
Warning in xy.coords(x, y, xlabel, ylabel, log): 1 y value <= 0
omitted from logarithmic plot

          Y Y.hat    resid.d    sqrt.d    sqrt.d2
 [1,]     0     0 0.0000e+00 0.0000000 0.0000e+00
 [2,]     1     0        Inf 1.0000000 1.0000e+00
 [3,]     2     3 3.7814e-01 0.3178372 1.0102e-01
 [4,]     3     5 9.3505e-01 0.5040172 2.5403e-01
 [5,]    10     8 4.6287e-01 0.3338505 1.1146e-01
 [6,]    11    10 9.6824e-02 0.1543471 2.3823e-02
 [7,]    12    13 7.8975e-02 0.1414497 2.0008e-02
 [8,]    20    22 1.8759e-01 0.2182798 4.7646e-02
 [9,]    30    28 1.3957e-01 0.1857230 3.4493e-02
[10,]    50    49 2.0271e-02 0.0710678 5.0506e-03
[11,]   100   101 9.9338e-03 0.0498756 2.4876e-03
[12,]   200   202 1.9868e-02 0.0705348 4.9752e-03
[13,]   500   498 8.0214e-03 0.0447662 2.0040e-03
[14,]  1000   999 1.0007e-03 0.0158153 2.5013e-04
[15,] 10000 10001 9.9993e-05 0.0049999 2.4999e-05
(d0 <- p.pois.loss(c(0,0,0,1,1,1),
                   c(0,1,2,0,1,2)))
Warning in xy.coords(x, y, xlabel, ylabel, log = log): 9 x values <= 0
omitted from logarithmic plot
Warning in xy.coords(x, y, xlabel, ylabel, log = log): 6 y values <= 0
omitted from logarithmic plot
Warning in xy.coords(x, y, xlabel, ylabel, log): 3 x values <= 0
omitted from logarithmic plot
Warning in xy.coords(x, y, xlabel, ylabel, log): 2 y values <= 0
omitted from logarithmic plot

     Y Y.hat resid.d  sqrt.d sqrt.d2
[1,] 0     0 0.00000 0.00000 0.00000
[2,] 0     1 2.00000 1.00000 1.00000
[3,] 0     2 4.00000 1.41421 2.00000
[4,] 1     0     Inf 1.00000 1.00000
[5,] 1     1 0.00000 0.00000 0.00000
[6,] 1     2 0.61371 0.41421 0.17157
(d.prop <- p.pois.loss(Y, 2*Y))
Warning in xy.coords(x, y, xlabel, ylabel, log = log): 3 x values <= 0
omitted from logarithmic plot
Warning in xy.coords(x, y, xlabel, ylabel, log = log): 3 y values <= 0
omitted from logarithmic plot
Warning in xy.coords(x, y, xlabel, ylabel, log): 1 x value <= 0
omitted from logarithmic plot
Warning in xy.coords(x, y, xlabel, ylabel, log): 1 y value <= 0
omitted from logarithmic plot

          Y Y.hat    resid.d   sqrt.d    sqrt.d2
 [1,]     0     0    0.00000  0.00000    0.00000
 [2,]     1     2    0.61371  0.41421    0.17157
 [3,]     2     4    1.22741  0.58579    0.34315
 [4,]     3     6    1.84112  0.71744    0.51472
 [5,]    10    20    6.13706  1.30986    1.71573
 [6,]    11    22    6.75076  1.37379    1.88730
 [7,]    12    24    7.36447  1.43488    2.05887
 [8,]    20    40   12.27411  1.85242    3.43146
 [9,]    30    60   18.41117  2.26874    5.14719
[10,]    50   100   30.68528  2.92893    8.57864
[11,]   100   200   61.37056  4.14214   17.15729
[12,]   200   400  122.74113  5.85786   34.31458
[13,]   500  1000  306.85282  9.26210   85.78644
[14,]  1000  2000  613.70564 13.09858  171.57288
[15,] 10000 20000 6137.05639 41.42136 1715.72875
library("Ryacas")
yacas("S(x,y):= 4(Sqrt(x) - Sqrt(y))^2")
expression(TRUE)
## Now set  y := x / (1 + delta)
yacas("S(x, x/(1+d))")
expression(`4`(root(x, 2) - root(x/(d + 1), 2))^2)
## cannot get 'yacas'  to see it

yacas("S:= 4*Taylor(d,0,5) 1 + 1/(1+d) - 2/Sqrt(1+d)")
expression(4 * (d^2/4 + -3 * d^3/8 + 29 * d^4/64 + -65 * d^5/128))
yacas("S:= Factor(Simplify(%))")
expression((-3 * d/2 + 29 * d^2/16 + -65 * d^3/32 + 1) * d^2)