In Poisson regression there are two deviances.

The null deviance shows how well the response variable is predicted by a model that includes only the intercept (grand mean) and the residual deviance which is −2 times the difference between the log-likelihood with p predictors and the log-likelihood for a “saturated model” (a theoretical model with a separate parameter for each observation).The residual deviance shows us how well the response variable is predicted by a model with p predictor variables. Lower value of the residual deviance means a better fitted model.

Now let us write down these likelihood functions.

Suppose \(Y\) has a Poisson distribution whose mean depends on vector \(\bf{x}\), for simplicity, we suppose \(\bf{x}\) only has one predictor variable. We write \[ E(Y|x)=\mu \] For Poisson regressions we can choose a log or an identity link function, we choose a log link here.

\[\log(\mu_i)=\beta_0+\beta_1x_i\] We also can see

\[\mu_i=e^{\beta_0+\beta_1x}\]

Note for Poisson regressions, another common model specification is to model a hazard function \(\lambda\) then we can write \(\lambda(t)=\mu\), for this specification we can treat \(t\) as an offset variable in the Poisson regression.

The Likelihood function with the parameter \(\beta_0\) and \(\beta_1\) can be written as: \[ L(\beta_0,\beta_1;y_i)=\prod_{i=1}^{n}\frac{e^{-u_i}\mu_i^{y_i}}{y_i!}=\prod_{i=1}^{n} \frac{e^{-e^{(\beta_0+\beta_1x_i)}}(e^{\beta_0+\beta_1x_i})^{y_i}}{y_i!} \]

The log-likelihood function is:

\[\begin{align*} l(\beta_0,\beta_1;y_i)&=-\sum_{i=1}^n\mu_i+\sum_{i=1}^ny_ilog(\mu_i)-\sum_{i=1}^nlog(y_i!)\\ &=-\sum_{i=1}^n(e^{\beta_0+\beta_1x_i})+\sum_{i=1}^ny_i(\beta_0+\beta_1x_i)-\sum_{i=1}^nlog(y_i!) \tag{1} \end{align*}\]

When calculate the null deviance we will plug in \(\beta_0\) into \((1)\). \(\beta_0\) will be calculated by a intercept only regression, \(\beta_1\) will be set to zero. We write \[ l(\beta_0;y_i)=-\sum_{i=1}^ne^{\beta_0}+\sum_{i=1}^ny_i\beta_0-\sum_{i-1}^nlog(y_i!) \tag{2} \]

Next we need to calculate the log-likelihood for “saturated model” (a theoretical model with a separate parameter for each observation), therefore, we have \(\mu_1,\mu_2,...,\mu_n\) parameters here.

(Note, in \((1)\), we only have two parameters, i.e. as long as the subject have same value for the predictor variables we think they are the same).

The log-likelihood function for “saturated model” is \[ l(\mu)=-\sum_{i=1}^n\mu_i+\sum_{i=1}^n y_i log\mu_i-\sum_{i=1}^n log(y_i!) \] Then it can be written as: \[ l(\mu)=-\left[\sum\mu_iI_{(y_i>0)}+\sum\mu_iI_{(y_i=0)}\right]+\left[\sum y_i I_{(y_i>0)} \log\mu_i+\sum y_i I_{(y_i=0)} \log\mu_i\right]+\sum \log(y_i!)I_{(y_i\ge0)} \tag{3} \] where \(I\) is an indicator function

\[I_{A}(\omega)={I(\omega \in A)} =\begin{cases} 1 & \text{ if } \omega \in A \\ 0 & \text{ if } \omega \notin A \end{cases}\] When \(y_i>0\), we have \[ \frac{\partial}{\partial \mu_i}l(\mu)=-1+\frac{y_i}{\mu_i} \] set to zero, we get

\[ \hat{\mu_i}=y_i \]

Now for the log-likelihood function \((3)\)of the “saturated model” when \(y_i>0\), we have \[ l(\hat{\mu})=-\sum y_i+\sum y_i \log y_i-\sum \log(y_i!) \tag{4} \] When \(y_i=0\) from the equation \((3)\) we can see its contribution to the likelihood function \((3)\) is 0, i.e those zero values will not affect the value of the log-likelihood function.

From \((4)\) we can see why we write the log-likelihood function as \((3)\) for a “saturated model” since \(\log y_i\) will be undefined when \(y_i=0\)

Anyway, the above derivations just show us that \(y_i=0\) in a Poisson “saturated model” does not affect the value of its log-likelihood function.

Now the let us calculate the deviance.

The Residual Deviance=\[-2[(1)-(4)]=-2*[l(\beta_0,\beta_1;y_i)-l(\hat{\mu})]\tag{5}\]

The Null Deviance= \[-2[(2)-(4)]=-2*[l(\beta_0;y_i)-l(\hat{\mu})]\tag{6} \]

Let us run a Poisson regression model using some simulated data. The glm function will automatically calculate the null deviance and residual deviance for us. Then we will calculate the two deviance by “hand”.

    x<- c(2,15,19,14,16,15,9,17,10,23,14,14,9,5,17,16,13,6,16,19,24,9,12,7,9,7,15,21,20,20)
    y<-c(0,6,4,1,5,2,2,10,3,10,2,6,5,2,2,7,6,2,5,5,6,2,5,1,3,3,3,4,6,9)
    p_data<-data.frame(y,x)
    p_glm<-glm(y~x, family=poisson, data=p_data)
    summary(p_glm)
## 
## Call:
## glm(formula = y ~ x, family = poisson, data = p_data)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.78079  -0.87855  -0.04969   0.77007   1.97433  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.30787    0.28943   1.064    0.287    
## x            0.07636    0.01730   4.413 1.02e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 48.310  on 29  degrees of freedom
## Residual deviance: 27.842  on 28  degrees of freedom
## AIC: 124.5
## 
## Number of Fisher Scoring iterations: 4

We can see \(\beta_0=0.30787,\beta_1=0.07636\), Null Deviance=48.31, Residual Deviance=27.84.

Here is the intercept only model

    p_glm2<-glm(y~1,family=poisson, data=p_data)
    summary(p_glm2)
## 
## Call:
## glm(formula = y ~ 1, family = poisson, data = p_data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.9097  -1.2113  -0.1145   0.8074   2.3788  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  1.44299    0.08874   16.26   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 48.31  on 29  degrees of freedom
## Residual deviance: 48.31  on 29  degrees of freedom
## AIC: 142.97
## 
## Number of Fisher Scoring iterations: 5

We can see \(\beta_0=1.44299\)

Now let us calculate the Null Deviance and Residual Deviance ‘by hand’.

Calculate the log-likelihood for the equation \((1)\), i.e regression with predictors.

    l_saturated<-c() #loglikelihood for saturated model
    l_regression<-c()#loglikelihood for regression model
    l_intercept<-c()#loglikelihood for intercept only model
    for(i in 1:30){
    l_regression[i]<--exp(0.30787 +0.07636 *x[i])+y[i]*(0.30787+0.07636 *x[i])-log(factorial(y[i]))} #equation 1

    l_reg<-sum(l_regression) 
    l_reg
## [1] -60.25116

Calculate the log-likelihood for the saturated model by equation \((4)\)

    for(i in 1:30){
    l_saturated[i]<-y[i]*try(log(y[i]),T)-y[i]-log(factorial(y[i]))
    }     #there is one y_i=0 need to take care
    l_sat<-sum(l_saturated,na.rm=T) 
    l_sat
## [1] -46.33012

Calculate the log-likelihood for the intercept only model by equation \((2)\)

    for(i in 1:30){
    l_intercept[i]<--exp(1.44299)+y[i]*(1.44299)-log(factorial(y[i]))
    }

    l_inter<-sum(l_intercept) 
    l_inter
## [1] -70.48501

Calculate the residual deviance by equation \((5)\)

    -2*(l_reg-l_sat) 
## [1] 27.84209

Calculate the null deviance by equation \((6)\)

    -2*(l_inter-l_sat) 
## [1] 48.30979

By above equations and calculated ‘by hand’ we can get the exactly the same results as calculated by the build in glm function.

References

1.http://www.theanalysisfactor.com/r-glm-model-fit/

2.Dobson, A. J., & Barnett, A. G. (2008). An introduction to generalized linear models. Chapman and Hall/CRC.