Introduction

Poisson logistic regression is a generalized linear model form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable Y has a Poisson distribution, and assumes the logarithm of its expected value can be modeled by a linear combination of unknown parameters. A Poisson regression model is sometimes known as a log-linear model, especially when used to model contingency tables.

In R, glm function are used for the Logistic regression with following parameters.

- formula: predictors used in formula (Systematic Component) 
- family: "poisson" needs to be chosen(Random Component) 
- link: "log".(Link Component)

Data: Valve Replacement Data: 21 heart valve replacement patients were followed until death, dropout,or the end of the study. There are following 4 observations. Valvetype: Valve type Age: Age group Deaths: Number of deaths TimeAtRisk: Time at risk in months for each cell.

Objective: In this study, I applied the Poisson logistic regression from simplest(null) complex(saturated) the model.

Methods to use:
- Poisson logistic regression (Null to Saturated)
- Lack of fit test to compare the models
- Drop in deviance test to compare the models

Data:

Deaths <- c(4, 1, 7, 9)
TimeAtRisk <- c(1259, 2082, 1417, 1647)
ValveType <- c("aortic", "mitral", "aortic", "mitral")
Age <- c("<55", "<55", "55+", "55+")
d <- data.frame(ValveType, Age, Deaths, TimeAtRisk)
d
##   ValveType Age Deaths TimeAtRisk
## 1    aortic <55      4       1259
## 2    mitral <55      1       2082
## 3    aortic 55+      7       1417
## 4    mitral 55+      9       1647

Let \(A_i\) = 1 if Age = 55+ and 0 otherwise, and \(M_i\) = 1 if ValveType = mitral and 0 otherwise.

Units(counts) are death here.

For cells i = 1,2,3,4, model the data as \[ Y_i\sim\mbox{independent Poisson}(t_i\mu(x_i)) \] where \(t_i\) is the known TimeAtRisk (in months) and \[ \log(\mbox{E}[Y_i])=\log t_i+\sum_{j=0}^p\beta_jx_{ij}. \]

First fit the the saturated model, with one parameter for each of the four cells.

Saturated Model

Saturated <- glm(Deaths ~ Age * ValveType, offset = log(TimeAtRisk), 
                 family = poisson(link = log))
# notice that Age + ValveType + Age * ValveType is the saturated model.
summary(Saturated)
## 
## Call:
## glm(formula = Deaths ~ Age * ValveType, family = poisson(link = log), 
##     offset = log(TimeAtRisk))
## 
## Deviance Residuals: 
## [1]  0  0  0  0
## 
## Coefficients:
##                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)             -5.7518     0.5000 -11.504   <2e-16 ***
## Age55+                   0.4414     0.6268   0.704   0.4813    
## ValveTypemitral         -1.8893     1.1180  -1.690   0.0911 .  
## Age55+:ValveTypemitral   1.9902     1.2264   1.623   0.1046    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 1.0841e+01  on 3  degrees of freedom
## Residual deviance: 8.8818e-16  on 0  degrees of freedom
## AIC: 21.127
## 
## Number of Fisher Scoring iterations: 3

We can notice that all deviance residuaks are zero because it is Saturated model.

Now, we can simply the models:

\[ \log(\mbox{E}[Y_i ]) = \log t_i +\beta_0 + \beta_1A_i + \beta_2 M_i. \] Additive GLM

Full <- glm(Deaths ~ Age + ValveType, offset = log(TimeAtRisk), 
            family = poisson(link = log))
summary(Full)
## 
## Call:
## glm(formula = Deaths ~ Age + ValveType, family = poisson(link = log), 
##     offset = log(TimeAtRisk))
## 
## Deviance Residuals: 
##      1       2       3       4  
##  1.025  -1.197  -0.602   0.613  
## 
## Coefficients:
##                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      -6.3121     0.5066 -12.460   <2e-16 ***
## Age55+            1.2209     0.5138   2.376   0.0175 *  
## ValveTypemitral  -0.3299     0.4382  -0.753   0.4515    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 10.8405  on 3  degrees of freedom
## Residual deviance:  3.2225  on 1  degrees of freedom
## AIC: 22.349
## 
## Number of Fisher Scoring iterations: 5

So we should look at the residual deviance which log likelihood additive model to log likelihood saturated model.

We can notice that it drops from 4 parameters to 3 parameters. So we can conduct that Chi squared test with df 1.

Lack of Fit test:

1 - pchisq(Full$deviance, df = 1)
## [1] 0.07263213

The comparison of the additive GLM to the Saturated model shows no strong evidence of lack-of-fit at level 0.05.So, we choose the simpler (additive) model.

Estimated numbers of deaths and rates per month are then

d <- data.frame(d, Expected_Deaths = exp(predict(Full)), 
                Rates_per_Month = exp(predict(Full)) / TimeAtRisk)
d
##   ValveType Age Deaths TimeAtRisk Expected_Deaths Rates_per_Month
## 1    aortic <55      4       1259        2.284108     0.001814224
## 2    mitral <55      1       2082        2.715892     0.001304463
## 3    aortic 55+      7       1417        8.715892     0.006150947
## 4    mitral 55+      9       1647        7.284108     0.004422652

Can we simplfy further?

We believe that age is related to death, but what about the valve type? What we really want to know is whether the type of valve makes a difference.

So, does the valve type make a difference controling the age? So, we try to simplify the model further by removing valve type.

If this model fits, then the valve type doesnt matter. if it doesnt fit, then valve type is important.

Reduced <- glm(Deaths ~ Age, offset = log(TimeAtRisk), family = poisson(link = log))
summary(Reduced)
## 
## Call:
## glm(formula = Deaths ~ Age, family = poisson(link = log), offset = log(TimeAtRisk))
## 
## Deviance Residuals: 
##       1        2        3        4  
##  1.3382  -1.3995  -0.1482   0.1352  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -6.5046     0.4472 -14.545   <2e-16 ***
## Age55+        1.2497     0.5123   2.439   0.0147 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 10.8405  on 3  degrees of freedom
## Residual deviance:  3.7897  on 2  degrees of freedom
## AIC: 20.917
## 
## Number of Fisher Scoring iterations: 5
anova(Reduced, Full)
## Analysis of Deviance Table
## 
## Model 1: Deaths ~ Age
## Model 2: Deaths ~ Age + ValveType
##   Resid. Df Resid. Dev Df Deviance
## 1         2     3.7897            
## 2         1     3.2225  1  0.56721

Drop in Deviance Test

DD<-Reduced$deviance-Full$deviance
DF<- 3-2
pchisq(DD, df = DF)
## [1] 0.5486288

So, we do NOt reject the Null hypothesis, it means the smaller model is adequate. So, the valve type doesnt make any difference here.

References:

  1. Colorado Stat University - Generalized Liner Models Course Notes STAA552
  2. https://en.wikipedia.org/wiki/Poisson_regression



***************