Other Samples

Example of Number of Accidents

Let X is a binary variable of 0 for sunny day and 1 for rainy day.

We have a sample of 20 days with 10 rainy and 10 sunny days from Poisson with rates 5 and 2 respectively.

In fact we can artificially generate this sample using R:

y <- c(rpois(10, lambda = 2), rpois(10, lambda = 5))
x <- rep(c(0,1), each = 10)
print(data.frame(y = y,x = x))

##    y x
## 1  1 0
## 2  3 0
## 3  3 0
## 4  1 0
## 5  3 0
## 6  2 0
## 7  3 0
## 8  2 0
## 9  4 0
## 10 4 0
## 11 6 1
## 12 9 1
## 13 3 1
## 14 1 1
## 15 6 1
## 16 4 1
## 17 5 1
## 18 8 1
## 19 5 1
## 20 5 1

accidents_model <- glm(y ~ x, family = poisson(link = "log"))
summary(accidents_model)

## 
## Call:
## glm(formula = y ~ x, family = poisson(link = "log"))
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   0.9555     0.1961   4.872  1.1e-06 ***
## x             0.6931     0.2402   2.886   0.0039 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 23.569  on 19  degrees of freedom
## Residual deviance: 14.734  on 18  degrees of freedom
## AIC: 80.466
## 
## Number of Fisher Scoring iterations: 4

What is this?

Let’s exponentiate the coefficients:

exp(accidents_model$coeff)

## (Intercept)           x 
##         2.6         2.0

Can we find the rate of accidents for each group (sunny or rainy)? How?

mu0 <- mean(y[x==0])
mu1 <- mean(y[x==1])
c(mu0, mu1, mu1/mu0)

## [1] 2.6 5.2 2.0

FASTRAK Example

The data are from the Canadian National Cardiovascular Disease registry called FASTRAK. They have been grouped by covariate patterns from individual observations.

The response is die, which is a count of the number of deaths of patients having a specific pattern of predictors.

Predictors are anterior, which indicates if the patient has had a previous anterior myocardial infarction; hcabg, if the patient has a history of having had a CABG (coronary artery bypass grafting) procedure; and killip class, a summary indicator of the cardiovascular health of the patient, with increasing values indicating increased disability.

The number of observations sharing the same pattern of covariates is recorded in the variable case. This value is log-transformed and entered into the model as an offset.

library(COUNT)

## Loading required package: msme

## Loading required package: MASS

## Loading required package: lattice

## Loading required package: sandwich

library(MASS)

fasttrakg <- read.delim("D:/UP Cebu Statistics Courses/1st Sem 2023-2024/STAT 197 (ST - Count Data Analysis)/MCD_Rdata/POIS/fasttrakg.txt")

attach(fasttrakg)

summary(fast <- glm(die ~ anterior + hcabg + factor(killip),
                family=poisson,
                offset=log(cases),
                data=fasttrakg))

## 
## Call:
## glm(formula = die ~ anterior + hcabg + factor(killip), family = poisson, 
##     data = fasttrakg, offset = log(cases))
## 
## Coefficients:
##                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      -4.0698     0.1459 -27.893  < 2e-16 ***
## anterior          0.6749     0.1596   4.229 2.34e-05 ***
## hcabg             0.6614     0.3267   2.024   0.0429 *  
## factor(killip)2   0.9020     0.1724   5.234 1.66e-07 ***
## factor(killip)3   1.1133     0.2513   4.430 9.44e-06 ***
## factor(killip)4   2.5126     0.2743   9.160  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 116.232  on 14  degrees of freedom
## Residual deviance:  10.932  on  9  degrees of freedom
## AIC: 73.992
## 
## Number of Fisher Scoring iterations: 5

cbind.data.frame(exp(coef(fast)))

##                 exp(coef(fast))
## (Intercept)          0.01708133
## anterior             1.96376577
## hcabg                1.93746487
## factor(killip)2      2.46463342
## factor(killip)3      3.04434885
## factor(killip)4     12.33745552

Patients having an anterior site heart attack are twice as likely to die than if the damage was to another area of the heart.

Patients with a history of having a CABG procedure are twice as likely to die than if they did not have such a procedure.

Patients having a killip 2 status are two-and-a-half times more likely to die than if they have level 1 killip level status (no perceived problem).

Those at level 3 are 3 times more likely to die, and those at level 4, which is experiencing a massive heart attack, are 12 times more likely to die than those with no apparent heart problems.

cbind.data.frame(exp(coef(fast))*sqrt(diag(vcov(fast))))

##                 exp(coef(fast)) * sqrt(diag(vcov(fast)))
## (Intercept)                                  0.002492295
## anterior                                     0.313359487
## hcabg                                        0.632970849
## factor(killip)2                              0.424784164
## factor(killip)3                              0.765119601
## factor(killip)4                              3.384215172

exp(confint.default(fast))

##                     2.5 %      97.5 %
## (Intercept)     0.0128329  0.02273622
## anterior        1.4363585  2.68482829
## hcabg           1.0212823  3.67554592
## factor(killip)2 1.7581105  3.45508317
## factor(killip)3 1.8602297  4.98221265
## factor(killip)4 7.2067174 21.12096274

modelfit(fast)

## $AIC
## [1] 73.9917
## 
## $AICn
## [1] 4.93278
## 
## $BIC
## [1] 78.24
## 
## $BICqh
## [1] 5.566187

German Health Registry Example

German health reform data for the year 1984. Subset of a multiyear registry evaluating differences in physician provider utilization prior to and after health reform legislation in the late 1980s.

data frame with 3,874 observations on the following 17 variables.

docvis number of visits to doctor during year (0-121)

hospvis number of days in hospital during year (0-51)

edlevel educational level (categorical: 1-4)

age age: 25-64

outwork out of work=1; 0=working

female female=1; 0=male

married married=1; 0=not married

kids have children=1; no children=0

hhninc household yearly income in marks (in Marks)

educ years of formal education (7-18)

self self-employed=1; not self employed=0

edlevel1 (1/0) not high school graduate

edlevel2 (1/0) high school graduate

edlevel3 (1/0) university/college

edlevel4 (1/0) graduate school

library(MASS)
library(msme)
data(rwm1984)

Using the German health data example, we may calculate the expected number of doctor visits for a working 50-year-old patient based on the table of coefficient values displayed as

glmrp <- glm(docvis ~ outwork + female + age, family=poisson, data=rwm1984)
summary(glmrp)

## 
## Call:
## glm(formula = docvis ~ outwork + female + age, family = poisson, 
##     data = rwm1984)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.1526854  0.0404913  -3.771 0.000163 ***
## outwork      0.2650804  0.0214395  12.364  < 2e-16 ***
## female       0.2864969  0.0210405  13.616  < 2e-16 ***
## age          0.0226701  0.0008433  26.883  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 25791  on 3873  degrees of freedom
## Residual deviance: 24004  on 3870  degrees of freedom
## AIC: 31094
## 
## Number of Fisher Scoring iterations: 6

exp(coef(glmrp))

## (Intercept)     outwork      female         age 
##   0.8583997   1.3035357   1.3317541   1.0229290

summary(nb2 <- nbinomial(docvis ~ outwork + female + age, data=rwm1984))

## Warning in dnbinom(y, size = scale, mu = y.hat, log = TRUE): NaNs produced

## Warning in dnbinom(y, size = scale, mu = y.hat, log = TRUE): NaNs produced

## Warning in dnbinom(y, size = scale, mu = y.hat, log = TRUE): NaNs produced

## Warning in dnbinom(y, size = scale, mu = y.hat, log = TRUE): NaNs produced

## Warning in dnbinom(y, size = scale, mu = y.hat, log = TRUE): NaNs produced

## Warning in dnbinom(y, size = scale, mu = y.hat, log = TRUE): NaNs produced

## Warning in dnbinom(y, size = scale, mu = y.hat, log = TRUE): NaNs produced

## 
## Call:
## ml_glm2(formula1 = formula1, formula2 = formula2, data = data, 
##     family = family, mean.link = mean.link, scale.link = scale.link, 
##     offset = offset, start = start, verbose = verbose)
## 
## Deviance Residuals:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -1.5622 -1.2637 -0.4671 -0.4542  0.1234  5.7875 
## 
## Pearson Residuals:
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -0.64287 -0.60735 -0.36410 -0.00012  0.13113 27.73995 
## 
## Coefficients (all in linear predictor):
##               Estimate      SE     Z        p     LCL    UCL
## (Intercept)    -0.1896 0.10794 -1.76    0.079 -0.4011 0.0220
## outwork         0.2874 0.05928  4.85 1.24e-06  0.1713 0.4036
## female          0.3269 0.05618  5.82 5.94e-09  0.2168 0.4370
## age             0.0228 0.00236  9.68 3.75e-22  0.0182 0.0274
## (Intercept)_s   2.2676 0.07030 32.25   3e-228  2.1298 2.4054
## 
## Null deviance: 4136.572  on  3872 d.f.
## Residual deviance: 3909.122  on  3869 d.f.
## Null Pearson: 5901.906  on  3872 d.f.
## Residual Pearson: 5813.889  on  3869 d.f.
## Dispersion: 1.502685
## AIC:  16641.7
## 
## Number of optimizer iterations:  142

exp(coef(nb2))

##   (Intercept)       outwork        female           age (Intercept)_s 
##      0.827303      1.333015      1.386603      1.023053      9.655960

summary(glmrnb <- glm.nb(docvis ~ outwork + female + age, data=rwm1984))

## 
## Call:
## glm.nb(formula = docvis ~ outwork + female + age, data = rwm1984, 
##     init.theta = 0.4411216682, link = log)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.188216   0.108488  -1.735   0.0828 .  
## outwork      0.286902   0.062766   4.571 4.85e-06 ***
## female       0.327318   0.059485   5.503 3.74e-08 ***
## age          0.022761   0.002389   9.529  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(0.4411) family taken to be 1)
## 
##     Null deviance: 4137.3  on 3873  degrees of freedom
## Residual deviance: 3909.8  on 3870  degrees of freedom
## AIC: 16642
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  0.4411 
##           Std. Err.:  0.0137 
## 
##  2 x log-likelihood:  -16631.6960

exp(coef(glmrnb))

## (Intercept)     outwork      female         age 
##   0.8284354   1.3322938   1.3872425   1.0230223

POISSON MARGINAL EFFECTS

A marginal effect relates a continuous predictor to the predicted probability of the response variable. Other predictors in the model are held at their mean, or sometimes median, values.

Marginal effects pertain only to continuous predictors.
Discrete change or partial effects are used for binary and categorical predictors.

The basic interpretation of marginal effects relates to

how the probability of the count response changes with a one-unit change in the value of the continuous predictor.

We discuss two types of marginal effects:

the marginal effect taken at the mean value of the predictor and
the average marginal effect taken at the mean of the predicted counts.

library(COUNT)
data(rwm5yr); rwm1984 <- subset(rwm5yr, year==1984)
summary(pmem <- glm(docvis ~ outwork + age, family=poisson,
data=rwm1984))

## 
## Call:
## glm(formula = docvis ~ outwork + age, family = poisson, data = rwm1984)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.0335170  0.0391815  -0.855    0.392    
## outwork      0.4079314  0.0188447  21.647   <2e-16 ***
## age          0.0220842  0.0008377  26.362   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 25791  on 3873  degrees of freedom
## Residual deviance: 24190  on 3871  degrees of freedom
## AIC: 31279
## 
## Number of Fisher Scoring iterations: 6

mout <- mean(rwm1984$outwork); mage <- mean(rwm1984$age)
xb <- coef(pmem)[1] + coef(pmem)[2]*mout + coef(pmem)[3]*mage
dfdxb <- exp(xb) * coef(pmem)[3]
mean(dfdxb)

## [1] 0.06552846

The marginal effect of age, .0655, may be understood to mean:

The number of visits made to a doctor increases by 6.6% for each year of age, when outwork is set at its mean value.

Average Marginal Effects

mean(rwm1984$docvis) * coef(pmem)[3]

##        age 
## 0.06984968

The above-average marginal effects tell us that there are on average 0.07 more doctor visits given each additional year of patient age, with outwork at its mean value.

For each additional year of age, there are on average 0.07 additional doctor visits, with outwork at its mean value.