In Poisson regression there are two deviances.
The null deviance shows how well the response variable is predicted by a model that includes only the intercept (grand mean) and the residual deviance which is −2 times the difference between the log-likelihood with p predictors and the log-likelihood for a “saturated model” (a theoretical model with a separate parameter for each observation).The residual deviance shows us how well the response variable is predicted by a model with p predictor variables. Lower value of the residual deviance means a better fitted model.
Now let us write down these likelihood functions.
Suppose \(Y\) has a Poisson distribution whose mean depends on vector \(\bf{x}\), for simplicity, we suppose \(\bf{x}\) only has one predictor variable. We write \[ E(Y|x)=\mu \] For Poisson regressions we can choose a log or an identity link function, we choose a log link here.
\[\log(\mu_i)=\beta_0+\beta_1x_i\] We also can see
\[\mu_i=e^{\beta_0+\beta_1x}\]
Note for Poisson regressions, another common model specification is to model a hazard function \(\lambda\) then we can write \(\lambda(t)=\mu\), for this specification we can treat \(t\) as an offset variable in the Poisson regression.
The Likelihood function with the parameter \(\beta_0\) and \(\beta_1\) can be written as: \[ L(\beta_0,\beta_1;y_i)=\prod_{i=1}^{n}\frac{e^{-u_i}\mu_i^{y_i}}{y_i!}=\prod_{i=1}^{n} \frac{e^{-e^{(\beta_0+\beta_1x_i)}}(e^{\beta_0+\beta_1x_i})^{y_i}}{y_i!} \]
The log-likelihood function is:
\[\begin{align*} l(\beta_0,\beta_1;y_i)&=-\sum_{i=1}^n\mu_i+\sum_{i=1}^ny_ilog(\mu_i)-\sum_{i=1}^nlog(y_i!)\\ &=-\sum_{i=1}^n(e^{\beta_0+\beta_1x_i})+\sum_{i=1}^ny_i(\beta_0+\beta_1x_i)-\sum_{i=1}^nlog(y_i!) \tag{1} \end{align*}\]
When calculate the null deviance we will plug in \(\beta_0\) into \((1)\). \(\beta_0\) will be calculated by a intercept only regression, \(\beta_1\) will be set to zero. We write \[ l(\beta_0;y_i)=-\sum_{i=1}^ne^{\beta_0}+\sum_{i=1}^ny_i\beta_0-\sum_{i-1}^nlog(y_i!) \tag{2} \]
Next we need to calculate the log-likelihood for “saturated model” (a theoretical model with a separate parameter for each observation), therefore, we have \(\mu_1,\mu_2,...,\mu_n\) parameters here.
(Note, in \((1)\), we only have two parameters, i.e. as long as the subject have same value for the predictor variables we think they are the same).
The log-likelihood function for “saturated model” is \[ l(\mu)=-\sum_{i=1}^n\mu_i+\sum_{i=1}^n y_i log\mu_i-\sum_{i=1}^n log(y_i!) \] Then it can be written as: \[ l(\mu)=-\left[\sum\mu_iI_{(y_i>0)}+\sum\mu_iI_{(y_i=0)}\right]+\left[\sum y_i I_{(y_i>0)} \log\mu_i+\sum y_i I_{(y_i=0)} \log\mu_i\right]+\sum \log(y_i!)I_{(y_i\ge0)} \tag{3} \] where \(I\) is an indicator function
\[I_{A}(\omega)={I(\omega \in A)} =\begin{cases} 1 & \text{ if } \omega \in A \\ 0 & \text{ if } \omega \notin A \end{cases}\] When \(y_i>0\), we have \[ \frac{\partial}{\partial \mu_i}l(\mu)=-1+\frac{y_i}{\mu_i} \] set to zero, we get
\[ \hat{\mu_i}=y_i \]
Now for the log-likelihood function \((3)\)of the “saturated model” when \(y_i>0\), we have \[ l(\hat{\mu})=-\sum y_i+\sum y_i \log y_i-\sum \log(y_i!) \tag{4} \] When \(y_i=0\) from the equation \((3)\) we can see its contribution to the likelihood function \((3)\) is 0, i.e those zero values will not affect the value of the log-likelihood function.
From \((4)\) we can see why we write the log-likelihood function as \((3)\) for a “saturated model” since \(\log y_i\) will be undefined when \(y_i=0\)
Anyway, the above derivations just show us that \(y_i=0\) in a Poisson “saturated model” does not affect the value of its log-likelihood function.
Now the let us calculate the deviance.
The Residual Deviance=\[-2[(1)-(4)]=-2*[l(\beta_0,\beta_1;y_i)-l(\hat{\mu})]\tag{5}\]
The Null Deviance= \[-2[(2)-(4)]=-2*[l(\beta_0;y_i)-l(\hat{\mu})]\tag{6} \]
Let us run a Poisson regression model using some simulated data. The glm function will automatically calculate the null deviance and residual deviance for us. Then we will calculate the two deviance by “hand”.
x<- c(2,15,19,14,16,15,9,17,10,23,14,14,9,5,17,16,13,6,16,19,24,9,12,7,9,7,15,21,20,20)
y<-c(0,6,4,1,5,2,2,10,3,10,2,6,5,2,2,7,6,2,5,5,6,2,5,1,3,3,3,4,6,9)
p_data<-data.frame(y,x)
p_glm<-glm(y~x, family=poisson, data=p_data)
summary(p_glm)
##
## Call:
## glm(formula = y ~ x, family = poisson, data = p_data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.78079 -0.87855 -0.04969 0.77007 1.97433
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.30787 0.28943 1.064 0.287
## x 0.07636 0.01730 4.413 1.02e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 48.310 on 29 degrees of freedom
## Residual deviance: 27.842 on 28 degrees of freedom
## AIC: 124.5
##
## Number of Fisher Scoring iterations: 4
We can see \(\beta_0=0.30787,\beta_1=0.07636\), Null Deviance=48.31, Residual Deviance=27.84.
Here is the intercept only model
p_glm2<-glm(y~1,family=poisson, data=p_data)
summary(p_glm2)
##
## Call:
## glm(formula = y ~ 1, family = poisson, data = p_data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.9097 -1.2113 -0.1145 0.8074 2.3788
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.44299 0.08874 16.26 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 48.31 on 29 degrees of freedom
## Residual deviance: 48.31 on 29 degrees of freedom
## AIC: 142.97
##
## Number of Fisher Scoring iterations: 5
We can see \(\beta_0=1.44299\)
Now let us calculate the Null Deviance and Residual Deviance ‘by hand’.
Calculate the log-likelihood for the equation \((1)\), i.e regression with predictors.
l_saturated<-c() #loglikelihood for saturated model
l_regression<-c()#loglikelihood for regression model
l_intercept<-c()#loglikelihood for intercept only model
for(i in 1:30){
l_regression[i]<--exp(0.30787 +0.07636 *x[i])+y[i]*(0.30787+0.07636 *x[i])-log(factorial(y[i]))} #equation 1
l_reg<-sum(l_regression)
l_reg
## [1] -60.25116
Calculate the log-likelihood for the saturated model by equation \((4)\)
for(i in 1:30){
l_saturated[i]<-y[i]*try(log(y[i]),T)-y[i]-log(factorial(y[i]))
} #there is one y_i=0 need to take care
l_sat<-sum(l_saturated,na.rm=T)
l_sat
## [1] -46.33012
Calculate the log-likelihood for the intercept only model by equation \((2)\)
for(i in 1:30){
l_intercept[i]<--exp(1.44299)+y[i]*(1.44299)-log(factorial(y[i]))
}
l_inter<-sum(l_intercept)
l_inter
## [1] -70.48501
Calculate the residual deviance by equation \((5)\)
-2*(l_reg-l_sat)
## [1] 27.84209
Calculate the null deviance by equation \((6)\)
-2*(l_inter-l_sat)
## [1] 48.30979
By above equations and calculated ‘by hand’ we can get the exactly the same results as calculated by the build in glm function.
1.http://www.theanalysisfactor.com/r-glm-model-fit/
2.Dobson, A. J., & Barnett, A. G. (2008). An introduction to generalized linear models. Chapman and Hall/CRC.