Let X is a binary variable of 0 for sunny day and 1 for rainy day.
We have a sample of 20 days with 10 rainy and 10 sunny days from Poisson with rates 5 and 2 respectively.
In fact we can artificially generate this sample using R:
y <- c(rpois(10, lambda = 2), rpois(10, lambda = 5))
x <- rep(c(0,1), each = 10)
print(data.frame(y = y,x = x))
## y x
## 1 1 0
## 2 3 0
## 3 3 0
## 4 1 0
## 5 3 0
## 6 2 0
## 7 3 0
## 8 2 0
## 9 4 0
## 10 4 0
## 11 6 1
## 12 9 1
## 13 3 1
## 14 1 1
## 15 6 1
## 16 4 1
## 17 5 1
## 18 8 1
## 19 5 1
## 20 5 1
accidents_model <- glm(y ~ x, family = poisson(link = "log"))
summary(accidents_model)
##
## Call:
## glm(formula = y ~ x, family = poisson(link = "log"))
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.9555 0.1961 4.872 1.1e-06 ***
## x 0.6931 0.2402 2.886 0.0039 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 23.569 on 19 degrees of freedom
## Residual deviance: 14.734 on 18 degrees of freedom
## AIC: 80.466
##
## Number of Fisher Scoring iterations: 4
What is this?
Let’s exponentiate the coefficients:
exp(accidents_model$coeff)
## (Intercept) x
## 2.6 2.0
Can we find the rate of accidents for each group (sunny or rainy)? How?
mu0 <- mean(y[x==0])
mu1 <- mean(y[x==1])
c(mu0, mu1, mu1/mu0)
## [1] 2.6 5.2 2.0
The data are from the Canadian National Cardiovascular Disease registry called FASTRAK. They have been grouped by covariate patterns from individual observations.
The response is die, which is a count of the number of deaths of patients having a specific pattern of predictors.
Predictors are anterior, which indicates if the patient has had a previous anterior myocardial infarction; hcabg, if the patient has a history of having had a CABG (coronary artery bypass grafting) procedure; and killip class, a summary indicator of the cardiovascular health of the patient, with increasing values indicating increased disability.
The number of observations sharing the same pattern of covariates is recorded in the variable case. This value is log-transformed and entered into the model as an offset.
library(COUNT)
## Loading required package: msme
## Loading required package: MASS
## Loading required package: lattice
## Loading required package: sandwich
library(MASS)
fasttrakg <- read.delim("D:/UP Cebu Statistics Courses/1st Sem 2023-2024/STAT 197 (ST - Count Data Analysis)/MCD_Rdata/POIS/fasttrakg.txt")
attach(fasttrakg)
summary(fast <- glm(die ~ anterior + hcabg + factor(killip),
family=poisson,
offset=log(cases),
data=fasttrakg))
##
## Call:
## glm(formula = die ~ anterior + hcabg + factor(killip), family = poisson,
## data = fasttrakg, offset = log(cases))
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.0698 0.1459 -27.893 < 2e-16 ***
## anterior 0.6749 0.1596 4.229 2.34e-05 ***
## hcabg 0.6614 0.3267 2.024 0.0429 *
## factor(killip)2 0.9020 0.1724 5.234 1.66e-07 ***
## factor(killip)3 1.1133 0.2513 4.430 9.44e-06 ***
## factor(killip)4 2.5126 0.2743 9.160 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 116.232 on 14 degrees of freedom
## Residual deviance: 10.932 on 9 degrees of freedom
## AIC: 73.992
##
## Number of Fisher Scoring iterations: 5
cbind.data.frame(exp(coef(fast)))
## exp(coef(fast))
## (Intercept) 0.01708133
## anterior 1.96376577
## hcabg 1.93746487
## factor(killip)2 2.46463342
## factor(killip)3 3.04434885
## factor(killip)4 12.33745552
Patients having an anterior site heart attack are twice as likely to die than if the damage was to another area of the heart.
Patients with a history of having a CABG procedure are twice as likely to die than if they did not have such a procedure.
Patients having a killip 2 status are two-and-a-half times more likely to die than if they have level 1 killip level status (no perceived problem).
Those at level 3 are 3 times more likely to die, and those at level 4, which is experiencing a massive heart attack, are 12 times more likely to die than those with no apparent heart problems.
cbind.data.frame(exp(coef(fast))*sqrt(diag(vcov(fast))))
## exp(coef(fast)) * sqrt(diag(vcov(fast)))
## (Intercept) 0.002492295
## anterior 0.313359487
## hcabg 0.632970849
## factor(killip)2 0.424784164
## factor(killip)3 0.765119601
## factor(killip)4 3.384215172
exp(confint.default(fast))
## 2.5 % 97.5 %
## (Intercept) 0.0128329 0.02273622
## anterior 1.4363585 2.68482829
## hcabg 1.0212823 3.67554592
## factor(killip)2 1.7581105 3.45508317
## factor(killip)3 1.8602297 4.98221265
## factor(killip)4 7.2067174 21.12096274
modelfit(fast)
## $AIC
## [1] 73.9917
##
## $AICn
## [1] 4.93278
##
## $BIC
## [1] 78.24
##
## $BICqh
## [1] 5.566187
German health reform data for the year 1984. Subset of a multiyear registry evaluating differences in physician provider utilization prior to and after health reform legislation in the late 1980s.
data frame with 3,874 observations on the following 17 variables.
docvis number of visits to doctor during year (0-121)
hospvis number of days in hospital during year (0-51)
edlevel educational level (categorical: 1-4)
age age: 25-64
outwork out of work=1; 0=working
female female=1; 0=male
married married=1; 0=not married
kids have children=1; no children=0
hhninc household yearly income in marks (in Marks)
educ years of formal education (7-18)
self self-employed=1; not self employed=0
edlevel1 (1/0) not high school graduate
edlevel2 (1/0) high school graduate
edlevel3 (1/0) university/college
edlevel4 (1/0) graduate school
library(MASS)
library(msme)
data(rwm1984)
Using the German health data example, we may calculate the expected number of doctor visits for a working 50-year-old patient based on the table of coefficient values displayed as
glmrp <- glm(docvis ~ outwork + female + age, family=poisson, data=rwm1984)
summary(glmrp)
##
## Call:
## glm(formula = docvis ~ outwork + female + age, family = poisson,
## data = rwm1984)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.1526854 0.0404913 -3.771 0.000163 ***
## outwork 0.2650804 0.0214395 12.364 < 2e-16 ***
## female 0.2864969 0.0210405 13.616 < 2e-16 ***
## age 0.0226701 0.0008433 26.883 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 25791 on 3873 degrees of freedom
## Residual deviance: 24004 on 3870 degrees of freedom
## AIC: 31094
##
## Number of Fisher Scoring iterations: 6
exp(coef(glmrp))
## (Intercept) outwork female age
## 0.8583997 1.3035357 1.3317541 1.0229290
summary(nb2 <- nbinomial(docvis ~ outwork + female + age, data=rwm1984))
## Warning in dnbinom(y, size = scale, mu = y.hat, log = TRUE): NaNs produced
## Warning in dnbinom(y, size = scale, mu = y.hat, log = TRUE): NaNs produced
## Warning in dnbinom(y, size = scale, mu = y.hat, log = TRUE): NaNs produced
## Warning in dnbinom(y, size = scale, mu = y.hat, log = TRUE): NaNs produced
## Warning in dnbinom(y, size = scale, mu = y.hat, log = TRUE): NaNs produced
## Warning in dnbinom(y, size = scale, mu = y.hat, log = TRUE): NaNs produced
## Warning in dnbinom(y, size = scale, mu = y.hat, log = TRUE): NaNs produced
##
## Call:
## ml_glm2(formula1 = formula1, formula2 = formula2, data = data,
## family = family, mean.link = mean.link, scale.link = scale.link,
## offset = offset, start = start, verbose = verbose)
##
## Deviance Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1.5622 -1.2637 -0.4671 -0.4542 0.1234 5.7875
##
## Pearson Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.64287 -0.60735 -0.36410 -0.00012 0.13113 27.73995
##
## Coefficients (all in linear predictor):
## Estimate SE Z p LCL UCL
## (Intercept) -0.1896 0.10794 -1.76 0.079 -0.4011 0.0220
## outwork 0.2874 0.05928 4.85 1.24e-06 0.1713 0.4036
## female 0.3269 0.05618 5.82 5.94e-09 0.2168 0.4370
## age 0.0228 0.00236 9.68 3.75e-22 0.0182 0.0274
## (Intercept)_s 2.2676 0.07030 32.25 3e-228 2.1298 2.4054
##
## Null deviance: 4136.572 on 3872 d.f.
## Residual deviance: 3909.122 on 3869 d.f.
## Null Pearson: 5901.906 on 3872 d.f.
## Residual Pearson: 5813.889 on 3869 d.f.
## Dispersion: 1.502685
## AIC: 16641.7
##
## Number of optimizer iterations: 142
exp(coef(nb2))
## (Intercept) outwork female age (Intercept)_s
## 0.827303 1.333015 1.386603 1.023053 9.655960
summary(glmrnb <- glm.nb(docvis ~ outwork + female + age, data=rwm1984))
##
## Call:
## glm.nb(formula = docvis ~ outwork + female + age, data = rwm1984,
## init.theta = 0.4411216682, link = log)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.188216 0.108488 -1.735 0.0828 .
## outwork 0.286902 0.062766 4.571 4.85e-06 ***
## female 0.327318 0.059485 5.503 3.74e-08 ***
## age 0.022761 0.002389 9.529 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(0.4411) family taken to be 1)
##
## Null deviance: 4137.3 on 3873 degrees of freedom
## Residual deviance: 3909.8 on 3870 degrees of freedom
## AIC: 16642
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 0.4411
## Std. Err.: 0.0137
##
## 2 x log-likelihood: -16631.6960
exp(coef(glmrnb))
## (Intercept) outwork female age
## 0.8284354 1.3322938 1.3872425 1.0230223
A marginal effect relates a continuous predictor to the predicted probability of the response variable. Other predictors in the model are held at their mean, or sometimes median, values.
The basic interpretation of marginal effects relates to
how the probability of the count response changes with a one-unit change in the value of the continuous predictor.
We discuss two types of marginal effects:
library(COUNT)
data(rwm5yr); rwm1984 <- subset(rwm5yr, year==1984)
summary(pmem <- glm(docvis ~ outwork + age, family=poisson,
data=rwm1984))
##
## Call:
## glm(formula = docvis ~ outwork + age, family = poisson, data = rwm1984)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.0335170 0.0391815 -0.855 0.392
## outwork 0.4079314 0.0188447 21.647 <2e-16 ***
## age 0.0220842 0.0008377 26.362 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 25791 on 3873 degrees of freedom
## Residual deviance: 24190 on 3871 degrees of freedom
## AIC: 31279
##
## Number of Fisher Scoring iterations: 6
mout <- mean(rwm1984$outwork); mage <- mean(rwm1984$age)
xb <- coef(pmem)[1] + coef(pmem)[2]*mout + coef(pmem)[3]*mage
dfdxb <- exp(xb) * coef(pmem)[3]
mean(dfdxb)
## [1] 0.06552846
The marginal effect of age, .0655, may be understood to mean:
The number of visits made to a doctor increases by 6.6% for each year of age, when outwork is set at its mean value.
mean(rwm1984$docvis) * coef(pmem)[3]
## age
## 0.06984968
The above-average marginal effects tell us that there are on average 0.07 more doctor visits given each additional year of patient age, with outwork at its mean value.
or
For each additional year of age, there are on average 0.07 additional doctor visits, with outwork at its mean value.