Objectives

  • Introduce probability model for count data.
  • Interpretation of coefficients on log and non-log scale.
  • Making predictions from a model in log and non-log counts.
  • Running comparisons for nested models and interpreting the results.

Models of count data

  • Generalised Linear Regression models for modelling counts as outcome variable.
  • A count is the number of times something has happened or a frequency.
  • Properties of counts
    • Non-negative integer (not -1, -4)
    • Does not allow decimals (1.4, -2.6)
  • Events can’t happen 1.5 times (but 1.5 times on average)
  • Examples?

Count data: Examples

  • Number of crimes committed by someone / in a year.
  • Number of times someone got sent to jail.
  • Number of times someone had an affair.
  • Number of words produced by a second grade writer.

Poisson distribution

  • People often use continuous models for count data, sometimes because they are not aware of the difference, or think they would be too difficult.
  • Sometimes that’s fine (large quantities) sometimes it is not (small quantities, i.e. when events are rare).
  • Not that much more difficult than a linear regression.
  • In our analysis examples we will focus on data that we assume are generated by a process that can be understood as a Poisson distribution.
  • There are other types of models for count data; we’ll cover some of them (Negative Binomial, Zero-inflated distributions).

Poisson distribution with \(\lambda = 3.5\)

  • A probability distribution of the number of times an event happened.
  • Lower bound at 0.
  • Distribution is not necessarily symmetric (for low rates).
  • One parameter: \(\lambda\) (Greek lambda) which is the rate of an event.

Also Poisson distributions

  • Rate parameter \(\lambda\) is the expected number of events / frequency in fixed window (day, hour, km\(^2\) etc).
  • It’s variance is also \(\lambda\).
  • As the average rate increases so does the variance.

Poisson distribution

  • If crimes per day correspond to \(y \sim \text{Poisson}(\lambda=3)\), then the expected (average) number of crimes per day is 3.
  • The variance is also equal to \(\lambda\), so \(\mathrm{Var}(y) = 3\).
  • Therefore, standard deviation is \(\sqrt{3} \approx 1.73\).

Poisson distribution

In a Poisson model

\[ \begin{aligned} y_i &\sim \text{Poisson}(\lambda_i) \end{aligned} \]

we assume that \(y\) is distributed following a Poisson process.

A logarithmic link function maps the rate \(\lambda\) to a linear function of a predictor \(x\):

\[ \log(\lambda_i) = \beta_0 + \beta_1 \times x_i \]

This allows us to model rate changes relative to changes in one (more more) predictor variable(s).

Similar to binary logistic regression: probability is not a linear function of the predictor variables but log-odds are.

Implementing a Poisson regression in \(R\)

We will be focusing on cigs: number of cigarettes smoked in a particular period of time (per day).

# Load the smoking data
smoking_df <- read_csv(
  "https://raw.githubusercontent.com/mark-andrews/ntupsychology-data/main/data-sets/smoking.csv")

# Check variables in data
glimpse(smoking_df)
Rows: 807
Columns: 10
$ educ     <dbl> 16.0, 16.0, 12.0, 13.5, 10.0, 6.0, 12.0, 15.0, 12.0, 12.0, 13…
$ cigpric  <dbl> 60.506, 57.883, 57.664, 57.883, 58.320, 59.340, 57.883, 57.88…
$ white    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1…
$ age      <dbl> 46, 40, 58, 30, 17, 86, 35, 48, 48, 31, 27, 24, 71, 44, 22, 2…
$ income   <dbl> 20000, 30000, 30000, 20000, 20000, 6500, 20000, 30000, 20000,…
$ cigs     <dbl> 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 10, 20, 30, 0, 0, 20, 0, 30, 0,…
$ restaurn <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ lincome  <dbl> 9.903487, 10.308952, 10.308952, 9.903487, 9.903487, 8.779557,…
$ agesq    <dbl> 2116, 1600, 3364, 900, 289, 7396, 1225, 2304, 2304, 961, 729,…
$ lcigpric <dbl> 4.102743, 4.058424, 4.054633, 4.058424, 4.065945, 4.083283, 4…

Implementing a Poisson regression in \(R\)

Number of each value of cigs: do any values stand out?

Implementing a Poisson regression in \(R\)

Distribution family is Poisson and link is log (default).

# Poisson model with intercept only
m_cigs_0 <- glm(cigs ~ 1,
              data = smoking_df,
              family = poisson(link = "log"))

# Return estimate for model parameters
coef(m_cigs_0)
(Intercept) 
   2.161769 

The intercept is the log number of cigarettes.

Implementing a Poisson regression in \(R\)

coef(m_cigs_0) # Intercept coefficient on log scale (log number of cigs)
(Intercept) 
   2.161769 

Coefficient of intercept-only model is the same as the log of the average (mean) number of cigs.

# Average of cigs 
d_mean <- describe(data = smoking_df, avg = mean(cigs))

# Add log of the average
mutate(d_mean, log_avg = log(avg))
# A tibble: 1 × 2
    avg log_avg
  <dbl>   <dbl>
1  8.69    2.16

How can we turn the intercept estimate into “number of cigarettes”? Remember odds vs log odds.

Implementing a Poisson regression in \(R\)

How does the number of cigs smoked vary as a function of age, education and income as predictors?

m_cigs <- glm(cigs ~ age + educ + lincome, data = smoking_df, family = poisson(link = "log"))

Summary of Poisson model

summary(m_cigs)
Call:
glm(formula = cigs ~ age + educ + lincome, family = poisson(link = "log"), 
    data = smoking_df)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.7978288  0.1858839   4.292 1.77e-05 ***
age         -0.0049920  0.0007308  -6.831 8.44e-12 ***
educ        -0.0458054  0.0042516 -10.774  < 2e-16 ***
lincome      0.2193037  0.0195039  11.244  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 15821  on 806  degrees of freedom
Residual deviance: 15596  on 803  degrees of freedom
AIC: 17075

Number of Fisher Scoring iterations: 6

Interpretation of model coefficients

coef(m_cigs)
 (Intercept)          age         educ      lincome 
 0.797828756 -0.004992017 -0.045805445  0.219303695 
  • For the slope estimates, notice the signs and related p-values
  • Example interpretation: older individuals smoke less
  • The average log number of cigarettes is lower.
  • By how much does the average number of cigs smoked change for every unit change in age, educ, lincome?
  • Estimates are the change in the logarithm of the mean.

Interpretation of model coefficients

Extract model coefficients and store them in betas:

betas <- coef(m_cigs) 
betas
 (Intercept)          age         educ      lincome 
 0.797828756 -0.004992017 -0.045805445  0.219303695 

Change in the log mean of cigaretts per day for each year of age.

betas['age']
         age 
-0.004992017 

Interpretation of model coefficients

The log number of cigs for an 18 year old (after controlling for income and education) is

log_mean <- betas["(Intercept)"] + betas['age'] * 18
log_mean
(Intercept) 
  0.7079725 

Predicting outcomes

What is the log mean (number of cigs) if age = 25, educ = 15, and lincome = 10?

betas <- coef(m_cigs) # assigning the output of the function to an object 'betas'
betas
 (Intercept)          age         educ      lincome 
 0.797828756 -0.004992017 -0.045805445  0.219303695 

Use the linear regression equation as usual; use indexing for the coefficients stored in betas.

Predicting outcomes

What is the log mean (number of cigs) if age = 25, educ = 15, and lincome = 10?

# Saving the log average of the calculation to the object 'log_mean'
log_mean <- betas[1] + betas[2] * 25 + betas[3] * 15 + betas[4] * 10
log_mean
(Intercept) 
   2.178984 

Convert the log average number of cigs to the actual number of cigs.

Predicting outcomes

\(e\) to the power of 2.179 which is \(e^\beta\) in R is exp() is the inverse of logarithm.

# Exponential of log_mean value (2.178984) 
# Saving this calculation to an object called 'lambda' 
lambda <- exp(log_mean) 
lambda # <- average number of cigs
(Intercept) 
    8.83732 

transforms it back

# the opposite of exponential is the natural logarithm 
log(lambda)
(Intercept) 
   2.178984 

Probability of values in Poisson distributions

This value (~8.837) is the mean of the distribution of people who have the specific characteristics of age = 25, educ = 15, lincome = 10; average of \(\text{Poisson}(\lambda = 8.837)\).

Probability of values in Poisson distributions

What is the probability that a person with these characteristics smokes exactly 5 cigs?

We can calculate the probability of obtaining any given value in a Poisson distribution with a given \(\lambda\):

dpois(x = 5, lambda = lambda)
[1] 0.06522625

This is the probability of smoking exactly 5 cigs given the Poisson distribution implied by a hypothetical participant with the above characteristics.

Exercise

For a hypothetical participant with the following characteristics, age = 18, educ = 4, lincome = 7, what is the probability that they smoke 3, 6, 10 cigarettes?

  • Calculate the log average number of cigs using the regression equation.
  • Take the exponential of that average.
  • Used dpois to get the probability

Which number for x maximizes this probability value?

Interpretation of model coefficients

betas
 (Intercept)          age         educ      lincome 
 0.797828756 -0.004992017 -0.045805445  0.219303695 

\(e^{\beta_\text{age}}\) is the multiplicative factor by which the mean changes for each one-unit increase in the predictor (year for age) holding other predictors constant.

exp(betas['age'])
      age 
0.9950204 

If the factor of change is less than than 1 it decreases! (example follows)

What’s the factor which the mean of cigs changes for each one-unit language in lincome?

Interpretation of model coefficients

Alternative way to get these log means is to use predict or add_predictions

Let’s use three different values for age but, lets keep other variables constant.

# New dataframe with 3 values of age (21, 22, 23) but each have an educ of 15, lincome of 10
smoking_df2 <- tibble(age = c(21, 22, 23), educ = 15, lincome = 10)
smoking_df2
# A tibble: 3 × 3
    age  educ lincome
  <dbl> <dbl>   <dbl>
1    21    15      10
2    22    15      10
3    23    15      10

Interpretation of model coefficients

Predicted log of the mean values for these three different characteristics:

# Unless specified, these will be in the units of the model so logs)
library(modelr)
add_predictions(data = smoking_df2, model = m_cigs)
# A tibble: 3 × 4
    age  educ lincome  pred
  <dbl> <dbl>   <dbl> <dbl>
1    21    15      10  2.20
2    22    15      10  2.19
3    23    15      10  2.19

Interpretation of model coefficients

Predicted the mean values for these three different characteristics

# We can specify predictions to be the same unit as the data (number of cigs)
add_predictions(data = smoking_df2, model = m_cigs, type = 'response')
# A tibble: 3 × 4
    age  educ lincome  pred
  <dbl> <dbl>   <dbl> <dbl>
1    21    15      10  9.02
2    22    15      10  8.97
3    23    15      10  8.93

Interpretation of coefficients

The factor by which the mean changes for a unit change in age was exp(betas['age']) = 0.9950204.

Given the predicted values

add_predictions(data = smoking_df2, model = m_cigs, type = 'response')
# A tibble: 3 × 4
    age  educ lincome  pred
  <dbl> <dbl>   <dbl> <dbl>
1    21    15      10  9.02
2    22    15      10  8.97
3    23    15      10  8.93

the change between the number of cigarettes for adjacent ages 21 and 22 years

8.970664 / 9.015557 # change in age by 1 unit 
[1] 0.9950205

What’s the change between 22 and 23 years?

Model comparisons using likelihood ratio tests

Implement the following two models with \(\text{cigs}_i \sim \text{Poisson}(\lambda_i)\):

\[ \mathcal{M_1}: \text{log}(\lambda_i) = \beta_0 + \beta_1 \times \log(\text{income}_i)\\ \mathcal{M_2}: \text{log}(\lambda_i) = \beta_0 + \beta_1 \times \log(\text{income}_i) + \beta_2 \times \text{education}_i + \beta_3 \times \text{white}_i \] We use white (binary), educ (as continuous variable), and the log of income (lincome).

Using anova(..., ...., test = 'Chisq') evaluate whether education and being white has a (combined) association with smoking (after controlling for income).

Model comparisons using likelihood ratio tests

For the same models \(\mathcal{M}_1\) and \(\mathcal{M}_2\):

  • What are the deviances of each of the two models (use the deviance function)?
  • What are the number of parameters for each model?
  • Can you confirm the p-value obtained for the \(\chi^2\) statistic (difference in deviances) using pchisq()?

Hint: You need the difference in deviance and the difference in number of parameters (as we did for binary logistic regression). Make sure the difference is positive (the deviance / number of parameters of the more complex model can’t be smaller than 0).

Interpretation of p-value i.e. \(Pr(>\chi^2)\): Probability to observe our difference in the deviances (or something more extreme) if both models were to make identical predictions. If p is below 0.05, the models are unlikely to be identical.

Deviance and log-likelihood

For binary logistic regression, we have seen that the deviance \(\mathcal{D}\) and the likelihood \(\mathcal{L}\) have the following correspondence:

\[ \mathcal{D} = -2 \times \log(\mathcal{L}) \]

This isn’t true for Poisson regression: The deviance of model m1 is

deviance(m1)
[1] 15740.23

but for the log likelihood we see that

-2 * logLik(m1) 
'log Lik.' 17210.8 (df=2)

Deviance and log-likelihood

However, for differences \(\Delta\) in \(\mathcal{D}\) and \(\mathcal{L}\) the following relation still holds:

\[ \Delta\mathcal{D} = -2 \times \Delta\log(\mathcal{L}) \]

This can be shown for our two models (code for models not included): The difference between the deviance

deviance(m1) - deviance(m2)
[1] 98.51167

is the same as as the negative double for the difference in log likelihoods

-2 * (logLik(m1) - logLik(m2))
'log Lik.' 98.51167 (df=2)