Poisson Regression

Objectives

Introduce probability model for count data.
Interpretation of coefficients on log and non-log scale.
Making predictions from a model in log and non-log counts.
Running comparisons for nested models and interpreting the results.

Models of count data

Generalised Linear Regression models for modelling counts as outcome variable.
A count is the number of times something has happened or a frequency.
Properties of counts
- Non-negative integer (not -1, -4)
- Does not allow decimals (1.4, -2.6)
Events can’t happen 1.5 times (but 1.5 times on average)
Examples?

Count data: Examples

Number of crimes committed by someone / in a year.
Number of times someone got sent to jail.
Number of times someone had an affair.
Number of words produced by a second grade writer.

Poisson distribution

People often use continuous models for count data, sometimes because they are not aware of the difference, or think they would be too difficult.
Sometimes that’s fine (large quantities) sometimes it is not (small quantities, i.e. when events are rare).
Not that much more difficult than a linear regression.
In our analysis examples we will focus on data that we assume are generated by a process that can be understood as a Poisson distribution.
There are other types of models for count data; we’ll cover some of them (Negative Binomial, Zero-inflated distributions).

Poisson distribution with \(\lambda = 3.5\)

A probability distribution of the number of times an event happened.
Lower bound at 0.
Distribution is not necessarily symmetric (for low rates).
One parameter: \(\lambda\) (Greek lambda) which is the rate of an event.

Also Poisson distributions

Rate parameter \(\lambda\) is the expected number of events / frequency in fixed window (day, hour, km\(^2\) etc).
It’s variance is also \(\lambda\).
As the average rate increases so does the variance.

Poisson distribution

If crimes per day correspond to \(y \sim \text{Poisson}(\lambda=3)\), then the expected (average) number of crimes per day is 3.
The variance is also equal to \(\lambda\), so \(\mathrm{Var}(y) = 3\).
Therefore, standard deviation is \(\sqrt{3} \approx 1.73\).

Poisson distribution

In a Poisson model

\[ \begin{aligned} y_i &\sim \text{Poisson}(\lambda_i) \end{aligned} \]

we assume that \(y\) is distributed following a Poisson process.

A logarithmic link function maps the rate \(\lambda\) to a linear function of a predictor \(x\):

\[ \log(\lambda_i) = \beta_0 + \beta_1 \times x_i \]

This allows us to model rate changes relative to changes in one (more more) predictor variable(s).

Similar to binary logistic regression: probability is not a linear function of the predictor variables but log-odds are.

Implementing a Poisson regression in \(R\)

We will be focusing on cigs: number of cigarettes smoked in a particular period of time (per day).

# Load the smoking data
smoking_df <- read_csv(
  "https://raw.githubusercontent.com/mark-andrews/ntupsychology-data/main/data-sets/smoking.csv")

# Check variables in data
glimpse(smoking_df)

Rows: 807
Columns: 10
$ educ     <dbl> 16.0, 16.0, 12.0, 13.5, 10.0, 6.0, 12.0, 15.0, 12.0, 12.0, 13…
$ cigpric  <dbl> 60.506, 57.883, 57.664, 57.883, 58.320, 59.340, 57.883, 57.88…
$ white    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1…
$ age      <dbl> 46, 40, 58, 30, 17, 86, 35, 48, 48, 31, 27, 24, 71, 44, 22, 2…
$ income   <dbl> 20000, 30000, 30000, 20000, 20000, 6500, 20000, 30000, 20000,…
$ cigs     <dbl> 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 10, 20, 30, 0, 0, 20, 0, 30, 0,…
$ restaurn <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ lincome  <dbl> 9.903487, 10.308952, 10.308952, 9.903487, 9.903487, 8.779557,…
$ agesq    <dbl> 2116, 1600, 3364, 900, 289, 7396, 1225, 2304, 2304, 961, 729,…
$ lcigpric <dbl> 4.102743, 4.058424, 4.054633, 4.058424, 4.065945, 4.083283, 4…

Implementing a Poisson regression in \(R\)

Number of each value of cigs: do any values stand out?

Implementing a Poisson regression in \(R\)

Distribution family is Poisson and link is log (default).

# Poisson model with intercept only
m_cigs_0 <- glm(cigs ~ 1,
              data = smoking_df,
              family = poisson(link = "log"))

# Return estimate for model parameters
coef(m_cigs_0)

(Intercept) 
   2.161769

The intercept is the log number of cigarettes.

Implementing a Poisson regression in \(R\)

coef(m_cigs_0) # Intercept coefficient on log scale (log number of cigs)

(Intercept) 
   2.161769

Coefficient of intercept-only model is the same as the log of the average (mean) number of cigs.

# Average of cigs 
d_mean <- describe(data = smoking_df, avg = mean(cigs))

# Add log of the average
mutate(d_mean, log_avg = log(avg))

# A tibble: 1 × 2
    avg log_avg
  <dbl>   <dbl>
1  8.69    2.16

How can we turn the intercept estimate into “number of cigarettes”? Remember odds vs log odds.

Implementing a Poisson regression in \(R\)

How does the number of cigs smoked vary as a function of age, education and income as predictors?

m_cigs <- glm(cigs ~ age + educ + lincome, data = smoking_df, family = poisson(link = "log"))

Summary of Poisson model

summary(m_cigs)

Call:
glm(formula = cigs ~ age + educ + lincome, family = poisson(link = "log"), 
    data = smoking_df)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.7978288  0.1858839   4.292 1.77e-05 ***
age         -0.0049920  0.0007308  -6.831 8.44e-12 ***
educ        -0.0458054  0.0042516 -10.774  < 2e-16 ***
lincome      0.2193037  0.0195039  11.244  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 15821  on 806  degrees of freedom
Residual deviance: 15596  on 803  degrees of freedom
AIC: 17075

Number of Fisher Scoring iterations: 6

Interpretation of model coefficients

coef(m_cigs)

 (Intercept)          age         educ      lincome 
 0.797828756 -0.004992017 -0.045805445  0.219303695

For the slope estimates, notice the signs and related p-values
Example interpretation: older individuals smoke less
The average log number of cigarettes is lower.
By how much does the average number of cigs smoked change for every unit change in age, educ, lincome?
Estimates are the change in the logarithm of the mean.

Interpretation of model coefficients

Extract model coefficients and store them in betas:

betas <- coef(m_cigs) 
betas

 (Intercept)          age         educ      lincome 
 0.797828756 -0.004992017 -0.045805445  0.219303695

Change in the log mean of cigaretts per day for each year of age.

betas['age']

         age 
-0.004992017

Interpretation of model coefficients

The log number of cigs for an 18 year old (after controlling for income and education) is

log_mean <- betas["(Intercept)"] + betas['age'] * 18
log_mean

(Intercept) 
  0.7079725

Predicting outcomes

What is the log mean (number of cigs) if age = 25, educ = 15, and lincome = 10?

betas <- coef(m_cigs) # assigning the output of the function to an object 'betas'
betas

 (Intercept)          age         educ      lincome 
 0.797828756 -0.004992017 -0.045805445  0.219303695

Use the linear regression equation as usual; use indexing for the coefficients stored in betas.

Predicting outcomes

What is the log mean (number of cigs) if age = 25, educ = 15, and lincome = 10?

# Saving the log average of the calculation to the object 'log_mean'
log_mean <- betas[1] + betas[2] * 25 + betas[3] * 15 + betas[4] * 10
log_mean

(Intercept) 
   2.178984

Convert the log average number of cigs to the actual number of cigs.

Predicting outcomes

\(e\) to the power of 2.179 which is \(e^\beta\) in R is exp() is the inverse of logarithm.

# Exponential of log_mean value (2.178984) 
# Saving this calculation to an object called 'lambda' 
lambda <- exp(log_mean) 
lambda # <- average number of cigs

(Intercept) 
    8.83732

transforms it back

# the opposite of exponential is the natural logarithm 
log(lambda)

(Intercept) 
   2.178984

Probability of values in Poisson distributions

This value (~8.837) is the mean of the distribution of people who have the specific characteristics of age = 25, educ = 15, lincome = 10; average of \(\text{Poisson}(\lambda = 8.837)\).

Probability of values in Poisson distributions

What is the probability that a person with these characteristics smokes exactly 5 cigs?

We can calculate the probability of obtaining any given value in a Poisson distribution with a given \(\lambda\):

dpois(x = 5, lambda = lambda)

[1] 0.06522625

This is the probability of smoking exactly 5 cigs given the Poisson distribution implied by a hypothetical participant with the above characteristics.

Exercise

For a hypothetical participant with the following characteristics, age = 18, educ = 4, lincome = 7, what is the probability that they smoke 3, 6, 10 cigarettes?

Calculate the log average number of cigs using the regression equation.
Take the exponential of that average.
Used dpois to get the probability

Which number for x maximizes this probability value?

Interpretation of model coefficients

betas

 (Intercept)          age         educ      lincome 
 0.797828756 -0.004992017 -0.045805445  0.219303695

\(e^{\beta_\text{age}}\) is the multiplicative factor by which the mean changes for each one-unit increase in the predictor (year for age) holding other predictors constant.

exp(betas['age'])

      age 
0.9950204

If the factor of change is less than than 1 it decreases! (example follows)

What’s the factor which the mean of cigs changes for each one-unit language in lincome?

Interpretation of model coefficients

Alternative way to get these log means is to use predict or add_predictions

Let’s use three different values for age but, lets keep other variables constant.

# New dataframe with 3 values of age (21, 22, 23) but each have an educ of 15, lincome of 10
smoking_df2 <- tibble(age = c(21, 22, 23), educ = 15, lincome = 10)
smoking_df2

# A tibble: 3 × 3
    age  educ lincome
  <dbl> <dbl>   <dbl>
1    21    15      10
2    22    15      10
3    23    15      10

Interpretation of model coefficients

Predicted log of the mean values for these three different characteristics:

# Unless specified, these will be in the units of the model so logs)
library(modelr)
add_predictions(data = smoking_df2, model = m_cigs)

# A tibble: 3 × 4
    age  educ lincome  pred
  <dbl> <dbl>   <dbl> <dbl>
1    21    15      10  2.20
2    22    15      10  2.19
3    23    15      10  2.19

Interpretation of model coefficients

Predicted the mean values for these three different characteristics

# We can specify predictions to be the same unit as the data (number of cigs)
add_predictions(data = smoking_df2, model = m_cigs, type = 'response')

# A tibble: 3 × 4
    age  educ lincome  pred
  <dbl> <dbl>   <dbl> <dbl>
1    21    15      10  9.02
2    22    15      10  8.97
3    23    15      10  8.93

Interpretation of coefficients

The factor by which the mean changes for a unit change in age was exp(betas['age']) = 0.9950204.

Given the predicted values

add_predictions(data = smoking_df2, model = m_cigs, type = 'response')

# A tibble: 3 × 4
    age  educ lincome  pred
  <dbl> <dbl>   <dbl> <dbl>
1    21    15      10  9.02
2    22    15      10  8.97
3    23    15      10  8.93

the change between the number of cigarettes for adjacent ages 21 and 22 years

8.970664 / 9.015557 # change in age by 1 unit

[1] 0.9950205

What’s the change between 22 and 23 years?

Model comparisons using likelihood ratio tests

Implement the following two models with \(\text{cigs}_i \sim \text{Poisson}(\lambda_i)\):

\[ \mathcal{M_1}: \text{log}(\lambda_i) = \beta_0 + \beta_1 \times \log(\text{income}_i)\\ \mathcal{M_2}: \text{log}(\lambda_i) = \beta_0 + \beta_1 \times \log(\text{income}_i) + \beta_2 \times \text{education}_i + \beta_3 \times \text{white}_i \] We use white (binary), educ (as continuous variable), and the log of income (lincome).

Using anova(..., ...., test = 'Chisq') evaluate whether education and being white has a (combined) association with smoking (after controlling for income).

Model comparisons using likelihood ratio tests

For the same models \(\mathcal{M}_1\) and \(\mathcal{M}_2\):

What are the deviances of each of the two models (use the deviance function)?
What are the number of parameters for each model?
Can you confirm the p-value obtained for the \(\chi^2\) statistic (difference in deviances) using pchisq()?

Hint: You need the difference in deviance and the difference in number of parameters (as we did for binary logistic regression). Make sure the difference is positive (the deviance / number of parameters of the more complex model can’t be smaller than 0).

Interpretation of p-value i.e. \(Pr(>\chi^2)\): Probability to observe our difference in the deviances (or something more extreme) if both models were to make identical predictions. If p is below 0.05, the models are unlikely to be identical.

Deviance and log-likelihood

For binary logistic regression, we have seen that the deviance \(\mathcal{D}\) and the likelihood \(\mathcal{L}\) have the following correspondence:

\[ \mathcal{D} = -2 \times \log(\mathcal{L}) \]

This isn’t true for Poisson regression: The deviance of model m1 is

deviance(m1)

[1] 15740.23

but for the log likelihood we see that

-2 * logLik(m1)

'log Lik.' 17210.8 (df=2)

Deviance and log-likelihood

However, for differences \(\Delta\) in \(\mathcal{D}\) and \(\mathcal{L}\) the following relation still holds:

\[ \Delta\mathcal{D} = -2 \times \Delta\log(\mathcal{L}) \]

This can be shown for our two models (code for models not included): The difference between the deviance

deviance(m1) - deviance(m2)

[1] 98.51167

is the same as as the negative double for the difference in log likelihoods

-2 * (logLik(m1) - logLik(m2))

'log Lik.' 98.51167 (df=2)