Poisson regression

  • Another type of generalised linear model
  • Type of count model (one of three we will be looking at)

Count models

  • A regression model for modelling counts as outcome variable
  • A count is the number of times something has happened or a frequency
  • Properties of counts
    • Non-negative integer (not -1, -4)
    • Does not allow decimals (1.4, -2.6)
  • Events can’t happen 1.5 times (but 1.5 times on average)
  • Examples?

Count data: Examples

  • Number of crimes committed by someone

  • Number of times someone got sent to jail

  • Number of times someone had an affair

  • Sometimes / often people treat counts as a continuous variable

  • Sometimes that’s fine (large quantities) sometimes it is not (small quantities, i.e. when events are rare)

Poisson distribution

  • People often use continuous models for count data, sometimes because they are not aware of the difference, or think they would be too difficult.
  • Not that much more difficult than a linear regression
  • In our analysis examples we will focus on data that we assume are generated by a process that can be understood as a Poisson distribution.
  • There are other types of models for count data and we’ll cover some of them (Negative Binomial, Zero-inflated distributions)

Poisson distribution with \(\lambda = 3.5\)

  • A probability distribution of the number of times an event happened
  • Lower bound at 0.
  • Distribution is not necessarily symmetric (for low rates)
  • One parameter: \(\lambda\) (Greek lambda) which is the rate of an event

Also Poisson distributions

  • Rate parameter \(\lambda\) is the expected number of events / frequency in fixed window (day, hour, km\(^2\) etc)
  • It’s variance is also \(\lambda\)
  • As the average rate increases so does the variance.

Poisson distribution

  • If crimes per day correspond to \(y \sim \text{Poisson}($\lambda=$3)\), then the expected (average) number of crimes per day is 3.
  • The variance is also equal to \(\lambda\), so \(\mathrm{Var}(y) = 3\).
  • Therefore, standard deviation is \(\sqrt{3} \approx 1.73\).

Poisson distribution

In a Poison model

\[ \begin{aligned} y &\sim \text{Poisson}(\lambda_i) \end{aligned} \]

we assume that \(y\) is distributed following a Poisson process.

When the rate changes as a function of changes in predictor variables, we require a logarithmic link function to map the rate \(\lambda\) to a linear function of a set of predictors:

\[ \log(\lambda_i) = \beta_0 + \beta_1 \times x_i \]

Similar to binary logistic regression where the probability is not a linear function of the predictor variables but the log odds are.

Implementing a Poisson regression in \(R\)

We will be focusing on cigs: number of cigarettes smoked in a particular period of time (per day).

# Load the smoking data
smoking_df <- read_csv(
  "https://raw.githubusercontent.com/mark-andrews/ntupsychology-data/main/data-sets/smoking.csv")

# Check variables in data
glimpse(smoking_df)
Rows: 807
Columns: 10
$ educ     <dbl> 16.0, 16.0, 12.0, 13.5, 10.0, 6.0, 12.0, 15.0, 12.0, 12.0, 13…
$ cigpric  <dbl> 60.506, 57.883, 57.664, 57.883, 58.320, 59.340, 57.883, 57.88…
$ white    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1…
$ age      <dbl> 46, 40, 58, 30, 17, 86, 35, 48, 48, 31, 27, 24, 71, 44, 22, 2…
$ income   <dbl> 20000, 30000, 30000, 20000, 20000, 6500, 20000, 30000, 20000,…
$ cigs     <dbl> 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 10, 20, 30, 0, 0, 20, 0, 30, 0,…
$ restaurn <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ lincome  <dbl> 9.903487, 10.308952, 10.308952, 9.903487, 9.903487, 8.779557,…
$ agesq    <dbl> 2116, 1600, 3364, 900, 289, 7396, 1225, 2304, 2304, 961, 729,…
$ lcigpric <dbl> 4.102743, 4.058424, 4.054633, 4.058424, 4.065945, 4.083283, 4…

Implementing a Poisson regression in \(R\)

Number of each value of cigs: does any value stand out?

Implementing a Poisson regression in \(R\)

Distribution family is Poisson and link is log (default).

m_cigs_0 <- glm(cigs ~ 1,
              data = smoking_df,
              family = poisson(link = "log"))

coef(m_cigs_0)
(Intercept) 
   2.161769 

Hence, the intercept is the log number of cigarettes.

Implementing a Poisson regression in \(R\)

Coefficient of intercept only model is the same as the log of the average (mean) number of cigs.

# Average of cigs 
d_mean <- describe(data = smoking_df, avg = mean(cigs))

# Add log of the average
mutate(d_mean, log_avg = log(avg))
# A tibble: 1 × 2
    avg log_avg
  <dbl>   <dbl>
1  8.69    2.16

How can we turn the intercept estimate into “number of cigarettes”? Remember odds vs log odds.

coef(m_cigs_0)
(Intercept) 
   2.161769 

Implementing a Poisson regression in \(R\)

How does the number of cigs smoked vary as a function of age, education and income as predictors?

m_cigs <- glm(cigs ~ age + educ + lincome, data = smoking_df, family = poisson(link = "log"))

Summary of Poisson model

summary(m_cigs)
Call:
glm(formula = cigs ~ age + educ + lincome, family = poisson(link = "log"), 
    data = smoking_df)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.7978288  0.1858839   4.292 1.77e-05 ***
age         -0.0049920  0.0007308  -6.831 8.44e-12 ***
educ        -0.0458054  0.0042516 -10.774  < 2e-16 ***
lincome      0.2193037  0.0195039  11.244  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 15821  on 806  degrees of freedom
Residual deviance: 15596  on 803  degrees of freedom
AIC: 17075

Number of Fisher Scoring iterations: 6

Interpretation of model coefficients

coef(m_cigs)
 (Intercept)          age         educ      lincome 
 0.797828756 -0.004992017 -0.045805445  0.219303695 
  • For the slope estimates, notice the signs and related p-values
  • Example interpretation: older individuals smoke less
  • The average log number of cigarettes is lower.
  • By how much does the average number of cigs smoked change for every unit change in age, educ, lincome?
  • Estimates are the change in the logarithm of the mean.

Interpretation of model coefficients

Extract model coefficients and store them im betas:

betas <- coef(m_cigs) 
betas
 (Intercept)          age         educ      lincome 
 0.797828756 -0.004992017 -0.045805445  0.219303695 

Change in the log of the mean for each year of age.

betas['age']
         age 
-0.004992017 

Interpretation of model coefficients

Extract model coefficients and store them im betas:

betas <- coef(m_cigs) 
betas
 (Intercept)          age         educ      lincome 
 0.797828756 -0.004992017 -0.045805445  0.219303695 

\(e^{\beta_\text{age}}\) is the multiplicative factor by which the mean changes for each one-unit increase in the predictor (year for age) holding other predictors constant.

exp(betas['age'])
      age 
0.9950204 

Interpretation of model coefficients

Lower and upper bound of 95% confidence interval for the logarithm of the rate cigs for age

confint.default(m_cigs, parm='age') 
          2.5 %       97.5 %
age -0.00642435 -0.003559684

What is the 95% confidence interval for the factor by which the rate of cigs smokes change with age?

exp(confint.default(m_cigs, parm='age'))
        2.5 %    97.5 %
age 0.9935962 0.9964466

If a factor of change is less than than 1 it will always decrease! (example follows)

Interpretation of model coefficients

Alternative way to get these log means is to use predict or add_predictions

Let’s use three different values for age but, lets keep other variables constant.

# New dataframe with 3 values of age (21, 22, 23) but each have an educ of 15, lincome of 10
smoking_df2 <- tibble(age = c(21, 22, 23), educ = 15, lincome = 10)
smoking_df2
# A tibble: 3 × 3
    age  educ lincome
  <dbl> <dbl>   <dbl>
1    21    15      10
2    22    15      10
3    23    15      10

Interpretation of model coefficients

Predicted log of the mean values for these three different characteristics:

# Unless specified, these will be in the units of the model so logs)
library(modelr)
add_predictions(data = smoking_df2, model = m_cigs)
# A tibble: 3 × 4
    age  educ lincome  pred
  <dbl> <dbl>   <dbl> <dbl>
1    21    15      10  2.20
2    22    15      10  2.19
3    23    15      10  2.19

Interpretation of model coefficients

Predicted the mean values for these three different characteristics

# We can specify predictions to be the same unit as the data (number of cigs)
add_predictions(data = smoking_df2, model = m_cigs, type = 'response')
# A tibble: 3 × 4
    age  educ lincome  pred
  <dbl> <dbl>   <dbl> <dbl>
1    21    15      10  9.02
2    22    15      10  8.97
3    23    15      10  8.93

Interpretation of coefficients

The factor by which the mean changes for a unit change in age was exp(betas['age']) = 0.9950204.

Given the predicted values

add_predictions(data = smoking_df2, model = m_cigs, type = 'response')
# A tibble: 3 × 4
    age  educ lincome  pred
  <dbl> <dbl>   <dbl> <dbl>
1    21    15      10  9.02
2    22    15      10  8.97
3    23    15      10  8.93

the change between the number of cigarettes for adjacent ages 21 and 22 years

8.970664 / 9.015557 # change in age by 1 unit 
[1] 0.9950205

What’s the change between 22 and 23 years?

Predicting outcomes

What is the log mean (number of cigs) if age = 25, educ = 15, and lincome = 10?

Use the linear regression equation as usual; use indexing for the coefficients stored in betas.

betas <- coef(m_cigs) # assigning the output of the function to an object 'betas'
betas
 (Intercept)          age         educ      lincome 
 0.797828756 -0.004992017 -0.045805445  0.219303695 

Predicting outcomes

What is the log mean (number of cigs) if age = 25, educ = 15, and lincome = 10?

# Saving the log average of the calculation to the object 'log_mean'
log_mean <- betas[1] + betas[2] * 25 + betas[3] * 15 + betas[4] * 10
log_mean
(Intercept) 
   2.178984 

Convert the log average number of cigs to the actual number of cigs.

Predicting outcomes

\(e\) to the power of 2.179 which is \(e^\beta\) in R is exp() is the inverse of logarithm.

# Exponential of log_mean value (2.178984) 
# Saving this calculation to an object called 'lambda' 
lambda <- exp(log_mean) 
lambda # <- average number of cigs
(Intercept) 
    8.83732 

transforms it back

# the opposite of exponential is the natural logarithm 
log(lambda)
(Intercept) 
   2.178984 

Probability of values in Poisson distributions

This value (~8.837) is the mean of the distribution of people who have the specific characteristics of age = 25, educ = 15, lincome = 10; average of \(\text{Poisson}(\lambda = 8.837)\).

Probability of values in Poisson distributions

What is the probability that a person with these characteristics smokes exactly 5 cigs?

We can calculate the probability of obtaining any given value in a Poisson distribution with a given \(\lambda\):

dpois(x = 5, lambda = lambda)
[1] 0.06522625

This is the probability of smoking exactly 5 cigs given the Poisson distribution implied by a hypothetical participant with the above characteristics.

Exercise

For a hypothetical participant with the following characteristics, age = 18, educ = 4, lincome = 7, what is the probability that they smoke 3, 6, 10 cigarettes?

  • Calculate the log average number of cigs using the regression equation.
  • Take the exponential of that average.
  • Used dpois to get the probability

Which number for x maximizes this probability value?

Model comparisons using likelihood ratio tests

Implement the following two models with \(\text{cigs}_i \sim \text{Poisson}(\lambda_i)\):

\[ \mathcal{M_1}: \text{log}(\lambda_i) = \beta_0 + \beta_1 \times \log(\text{income})\\ \mathcal{M_2}: \text{log}(\lambda_i) = \beta_0 + \beta_1 \times \log(\text{income}) + \beta_2 \times \text{education} + \beta_3 \times \text{white} \] We use white (binary), educ (as continuous variable), and the log of income (lincome).

Using anova(..., ...., test = 'Chisq') evaluate whether education and being white has a (combined) association with smoking (after controlling for income).

What is the deviance of each of the two models?

Model comparisons using likelihood ratio tests

  • Difference in the deviance which can be obtained using deviance() (the lower the deviance the better fit the model).
  • Is their difference larger than 2 times the difference in the number of parameters between the two models?
  • A difference greater than 4 usually means that one model is better than the other.
  • Null hypothesis test using a \(\chi^2\) distribution which returns the p-value of pchisq()
  • The null hypothesis here is that models are identical: if p is below 0.05, the null hypothesis is unlikely.

What do you need to understand?

  • Name of the probability model for count data.
  • Making predictions from a model in log and non-log values.
  • Interpretation of coefficients.
  • Running and interpreting nested model comparison