Poisson Regression

Poisson regression

Another type of generalised linear model
Type of count model (one of three we will be looking at)

Count models

A regression model for modelling counts as outcome variable
A count is the number of times something has happened or a frequency
Properties of counts
- Non-negative integer (not -1, -4)
- Does not allow decimals (1.4, -2.6)
Events can’t happen 1.5 times (but 1.5 times on average)
Examples?

Count data: Examples

Number of crimes committed by someone
Number of times someone got sent to jail
Number of times someone had an affair
Sometimes / often people treat counts as a continuous variable
Sometimes that’s fine (large quantities) sometimes it is not (small quantities, i.e. when events are rare)

Poisson distribution

People often use continuous models for count data, sometimes because they are not aware of the difference, or think they would be too difficult.
Not that much more difficult than a linear regression
In our analysis examples we will focus on data that we assume are generated by a process that can be understood as a Poisson distribution.
There are other types of models for count data and we’ll cover some of them (Negative Binomial, Zero-inflated distributions)

Poisson distribution with $\lambda = 3.5$

A probability distribution of the number of times an event happened
Lower bound at 0.
Distribution is not necessarily symmetric (for low rates)
One parameter: $\lambda$ (Greek lambda) which is the rate of an event

Also Poisson distributions

Rate parameter $\lambda$ is the expected number of events / frequency in fixed window (day, hour, km$^2$ etc)
It’s variance is also $\lambda$
As the average rate increases so does the variance.

Poisson distribution

If crimes per day correspond to $y \sim \text{Poisson}($\lambda=$3)$, then the expected (average) number of crimes per day is 3.
The variance is also equal to $\lambda$, so $\mathrm{Var}(y) = 3$.
Therefore, standard deviation is $\sqrt{3} \approx 1.73$.

Poisson distribution

In a Poison model

\[ \begin{aligned} y &\sim \text{Poisson}(\lambda_i) \end{aligned} \]

we assume that $y$ is distributed following a Poisson process.

When the rate changes as a function of changes in predictor variables, we require a logarithmic link function to map the rate $\lambda$ to a linear function of a set of predictors:

\[ \log(\lambda_i) = \beta_0 + \beta_1 \times x_i \]

Similar to binary logistic regression where the probability is not a linear function of the predictor variables but the log odds are.

Implementing a Poisson regression in $R$

We will be focusing on cigs: number of cigarettes smoked in a particular period of time (per day).

# Load the smoking data
smoking_df <- read_csv(
  "https://raw.githubusercontent.com/mark-andrews/ntupsychology-data/main/data-sets/smoking.csv")

# Check variables in data
glimpse(smoking_df)

Rows: 807
Columns: 10
$ educ     <dbl> 16.0, 16.0, 12.0, 13.5, 10.0, 6.0, 12.0, 15.0, 12.0, 12.0, 13…
$ cigpric  <dbl> 60.506, 57.883, 57.664, 57.883, 58.320, 59.340, 57.883, 57.88…
$ white    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1…
$ age      <dbl> 46, 40, 58, 30, 17, 86, 35, 48, 48, 31, 27, 24, 71, 44, 22, 2…
$ income   <dbl> 20000, 30000, 30000, 20000, 20000, 6500, 20000, 30000, 20000,…
$ cigs     <dbl> 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 10, 20, 30, 0, 0, 20, 0, 30, 0,…
$ restaurn <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ lincome  <dbl> 9.903487, 10.308952, 10.308952, 9.903487, 9.903487, 8.779557,…
$ agesq    <dbl> 2116, 1600, 3364, 900, 289, 7396, 1225, 2304, 2304, 961, 729,…
$ lcigpric <dbl> 4.102743, 4.058424, 4.054633, 4.058424, 4.065945, 4.083283, 4…

Implementing a Poisson regression in $R$

Number of each value of cigs: does any value stand out?

Implementing a Poisson regression in $R$

Distribution family is Poisson and link is log (default).

m_cigs_0 <- glm(cigs ~ 1,
              data = smoking_df,
              family = poisson(link = "log"))

coef(m_cigs_0)

(Intercept) 
   2.161769

Hence, the intercept is the log number of cigarettes.

Implementing a Poisson regression in $R$

Coefficient of intercept only model is the same as the log of the average (mean) number of cigs.

# Average of cigs 
d_mean <- describe(data = smoking_df, avg = mean(cigs))

# Add log of the average
mutate(d_mean, log_avg = log(avg))

# A tibble: 1 × 2
    avg log_avg
  <dbl>   <dbl>
1  8.69    2.16

How can we turn the intercept estimate into “number of cigarettes”? Remember odds vs log odds.

coef(m_cigs_0)

(Intercept) 
   2.161769

Implementing a Poisson regression in $R$

How does the number of cigs smoked vary as a function of age, education and income as predictors?

m_cigs <- glm(cigs ~ age + educ + lincome, data = smoking_df, family = poisson(link = "log"))

Summary of Poisson model

summary(m_cigs)

Call:
glm(formula = cigs ~ age + educ + lincome, family = poisson(link = "log"), 
    data = smoking_df)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.7978288  0.1858839   4.292 1.77e-05 ***
age         -0.0049920  0.0007308  -6.831 8.44e-12 ***
educ        -0.0458054  0.0042516 -10.774  < 2e-16 ***
lincome      0.2193037  0.0195039  11.244  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 15821  on 806  degrees of freedom
Residual deviance: 15596  on 803  degrees of freedom
AIC: 17075

Number of Fisher Scoring iterations: 6

Interpretation of model coefficients

coef(m_cigs)

 (Intercept)          age         educ      lincome 
 0.797828756 -0.004992017 -0.045805445  0.219303695

For the slope estimates, notice the signs and related p-values
Example interpretation: older individuals smoke less
The average log number of cigarettes is lower.
By how much does the average number of cigs smoked change for every unit change in age, educ, lincome?
Estimates are the change in the logarithm of the mean.

Interpretation of model coefficients

Extract model coefficients and store them im betas:

betas <- coef(m_cigs) 
betas

 (Intercept)          age         educ      lincome 
 0.797828756 -0.004992017 -0.045805445  0.219303695

Change in the log of the mean for each year of age.

betas['age']

         age 
-0.004992017

Interpretation of model coefficients

Extract model coefficients and store them im betas:

betas <- coef(m_cigs) 
betas

 (Intercept)          age         educ      lincome 
 0.797828756 -0.004992017 -0.045805445  0.219303695

$e^{\beta_\text{age}}$ is the multiplicative factor by which the mean changes for each one-unit increase in the predictor (year for age) holding other predictors constant.

exp(betas['age'])

      age 
0.9950204

Interpretation of model coefficients

Lower and upper bound of 95% confidence interval for the logarithm of the rate cigs for age

confint.default(m_cigs, parm='age')

          2.5 %       97.5 %
age -0.00642435 -0.003559684

What is the 95% confidence interval for the factor by which the rate of cigs smokes change with age?

exp(confint.default(m_cigs, parm='age'))

        2.5 %    97.5 %
age 0.9935962 0.9964466

If a factor of change is less than than 1 it will always decrease! (example follows)

Interpretation of model coefficients

Alternative way to get these log means is to use predict or add_predictions

Let’s use three different values for age but, lets keep other variables constant.

# New dataframe with 3 values of age (21, 22, 23) but each have an educ of 15, lincome of 10
smoking_df2 <- tibble(age = c(21, 22, 23), educ = 15, lincome = 10)
smoking_df2

# A tibble: 3 × 3
    age  educ lincome
  <dbl> <dbl>   <dbl>
1    21    15      10
2    22    15      10
3    23    15      10

Interpretation of model coefficients

Predicted log of the mean values for these three different characteristics:

# Unless specified, these will be in the units of the model so logs)
library(modelr)
add_predictions(data = smoking_df2, model = m_cigs)

# A tibble: 3 × 4
    age  educ lincome  pred
  <dbl> <dbl>   <dbl> <dbl>
1    21    15      10  2.20
2    22    15      10  2.19
3    23    15      10  2.19

Interpretation of model coefficients

Predicted the mean values for these three different characteristics

# We can specify predictions to be the same unit as the data (number of cigs)
add_predictions(data = smoking_df2, model = m_cigs, type = 'response')

# A tibble: 3 × 4
    age  educ lincome  pred
  <dbl> <dbl>   <dbl> <dbl>
1    21    15      10  9.02
2    22    15      10  8.97
3    23    15      10  8.93

Interpretation of coefficients

The factor by which the mean changes for a unit change in age was exp(betas['age']) = 0.9950204.

Given the predicted values

add_predictions(data = smoking_df2, model = m_cigs, type = 'response')

# A tibble: 3 × 4
    age  educ lincome  pred
  <dbl> <dbl>   <dbl> <dbl>
1    21    15      10  9.02
2    22    15      10  8.97
3    23    15      10  8.93

the change between the number of cigarettes for adjacent ages 21 and 22 years

8.970664 / 9.015557 # change in age by 1 unit

[1] 0.9950205

What’s the change between 22 and 23 years?

Predicting outcomes

What is the log mean (number of cigs) if age = 25, educ = 15, and lincome = 10?

Use the linear regression equation as usual; use indexing for the coefficients stored in betas.

betas <- coef(m_cigs) # assigning the output of the function to an object 'betas'
betas

 (Intercept)          age         educ      lincome 
 0.797828756 -0.004992017 -0.045805445  0.219303695

Predicting outcomes

What is the log mean (number of cigs) if age = 25, educ = 15, and lincome = 10?

# Saving the log average of the calculation to the object 'log_mean'
log_mean <- betas[1] + betas[2] * 25 + betas[3] * 15 + betas[4] * 10
log_mean

(Intercept) 
   2.178984

Convert the log average number of cigs to the actual number of cigs.

Predicting outcomes

$e$ to the power of 2.179 which is $e^\beta$ in R is exp() is the inverse of logarithm.

# Exponential of log_mean value (2.178984) 
# Saving this calculation to an object called 'lambda' 
lambda <- exp(log_mean) 
lambda # <- average number of cigs

(Intercept) 
    8.83732

transforms it back

# the opposite of exponential is the natural logarithm 
log(lambda)

(Intercept) 
   2.178984

Probability of values in Poisson distributions

This value (~8.837) is the mean of the distribution of people who have the specific characteristics of age = 25, educ = 15, lincome = 10; average of $\text{Poisson}(\lambda = 8.837)$.

Probability of values in Poisson distributions

What is the probability that a person with these characteristics smokes exactly 5 cigs?

We can calculate the probability of obtaining any given value in a Poisson distribution with a given $\lambda$:

dpois(x = 5, lambda = lambda)

[1] 0.06522625

This is the probability of smoking exactly 5 cigs given the Poisson distribution implied by a hypothetical participant with the above characteristics.

Exercise

For a hypothetical participant with the following characteristics, age = 18, educ = 4, lincome = 7, what is the probability that they smoke 3, 6, 10 cigarettes?

Calculate the log average number of cigs using the regression equation.
Take the exponential of that average.
Used dpois to get the probability

Which number for x maximizes this probability value?

Model comparisons using likelihood ratio tests

Implement the following two models with $\text{cigs}_i \sim \text{Poisson}(\lambda_i)$:

\[ \mathcal{M_1}: \text{log}(\lambda_i) = \beta_0 + \beta_1 \times \log(\text{income})\\ \mathcal{M_2}: \text{log}(\lambda_i) = \beta_0 + \beta_1 \times \log(\text{income}) + \beta_2 \times \text{education} + \beta_3 \times \text{white} \] We use white (binary), educ (as continuous variable), and the log of income (lincome).

Using anova(..., ...., test = 'Chisq') evaluate whether education and being white has a (combined) association with smoking (after controlling for income).

What is the deviance of each of the two models?

Model comparisons using likelihood ratio tests

Difference in the deviance which can be obtained using deviance() (the lower the deviance the better fit the model).
Is their difference larger than 2 times the difference in the number of parameters between the two models?
A difference greater than 4 usually means that one model is better than the other.
Null hypothesis test using a $\chi^2$ distribution which returns the p-value of pchisq()
The null hypothesis here is that models are identical: if p is below 0.05, the null hypothesis is unlikely.

What do you need to understand?

Name of the probability model for count data.
Making predictions from a model in log and non-log values.
Interpretation of coefficients.
Running and interpreting nested model comparison