M7222 Generalized linear models

Seminar 6 – Poisson regression

This seminar focuses on the generalized linear model (GLM) with a Poisson distribution, a fundamental tool for modeling count data. The analysis is conducted using the icecream dataset and compares several regression approaches.

We begin with the classical linear model and its variant with a logarithmic transformation, highlighting their limitations when applied to count data. Next, we introduce Poisson regression models with different link functions, which provide a more appropriate framework for such data. Finally, we consider a binomial model as an alternative approach. All models are implemented using the glm function in .


First, load packages.

Code
Load libraries
library(dplyr)
library(tidyr)

## ggplot libraries
library(ggplot2)

## colors
col1 <- 'deepskyblue'
col2 <- 'tomato'

Theoretical framework

We start with Poisson distribution properties and then shift to the GLM with Poisson distribution.

Poisson Distribution

The Poisson distribution has probability mass function

\[ f(y; \lambda) = \frac{\lambda^y}{y!} e^{-\lambda} = \exp \left\{ -\lambda + y \log \lambda - \log y! \right\}. \]

It belongs to the exponential family, because \(f(y; \lambda)\) can be written in the form \[ f(y; \lambda) = \exp \left\{ \frac{\theta y - b(\theta)}{\varphi} + c(y, \varphi) \right\}, \] where \(\theta = \log (\lambda), \quad b(\theta) = e^{\theta}, \quad \varphi = 1\).

From standard properties of exponential family distributions, we obtain \[ \mathbb{E}Y = b'(\theta) = e^{\theta} = \lambda, \] \[ V(\mathbb{E}Y) = b'' \left( b'^{-1}(\mathbb{E}Y) \right) = \lambda, \] \[ \text{Var}(Y) = \varphi \cdot V(\mathbb{E}Y) = \lambda. \]

Canonical link for Poisson GLM

The canonical link function is given by \[ g(\mu) = \left( b' \right)^{-1}(\mu), \] and since \(b'(\theta) = e^{\theta} = \mu\), we obtain \[ g(\mu) = \log(\mu). \]

TipMotivation for Poisson Regression

Although linear models may provide a reasonable fit, they are not well suited for count data. In particular, they may produce non-integer or even negative predictions and can overestimate values at extreme covariate levels. In contrast, the Poisson model naturally respects the discrete and non-negative nature of the data.

Poisson Regression Model

We consider regression models for count data, with a particular focus on generalized linear models (GLMs). Let \(Y_i\) denote the outcome for the \(i\)-th observation with associated explanatory variables \(\boldsymbol{x}_i\).

A natural starting point for modeling count data is the Poisson regression model. We assume \[ Y_i \sim \text{Poisson}(\lambda_i), \] where \(\lambda_i = \mathbb{E}Y_i\) is the expected number of events. Within the GLM framework, the model is specified as

\[ g(\lambda_i) = \beta_0 + \boldsymbol\beta^\top \boldsymbol x_i, \]

where \(g(\cdot)\) is a link function, \(\beta_0\) is the intercept, and \(\boldsymbol\beta\) is the vector of regression parameters describing the effect of covariates \(\boldsymbol x_i\).

For simplicity, we consider the case of a single covariate \(x_i\) in the remainder of the theoretical framework.

For the canonical (\(\log\)) link, this becomes \[ \log \lambda_i = \beta_0 + \beta_1 x_i \quad \Rightarrow \quad \lambda_i = \mathrm e^{\beta_0 + \beta_1 x_i}. \]

ImportantComparison with Linear Models

Two common alternatives are the classical linear model and a linear model with log-transformed response: \[ \text{GLM + log-link:} \quad \log(\mathbb E Y_i) = \beta_0 + \beta_1 x_i \implies \mathbb E Y_i = \mathrm e^{\beta_0 + \beta_1 x_i}, \] \[ \text{LM + log-transformation:} \quad \mathbb E \log(Y_i) = \beta_0 + \beta_1 x_i \implies \mathbb E Y_i \neq \mathrm e^{\beta_0 + \beta_1 x_i}. \]

The latter does not directly model \(\mathbb{E}Y_i\), which complicates interpretation.

1 Implementation in

1.1 Fitting the GLM

To fit the regression models introduced above, we use glm() function. Assume we have data frame called data with two columns, outcome y and explanatory variable x.

Classical linear model

Code
model <- lm(y ~ x, data = data)

Alternatively, the same model can be expressed as a Gaussian GLM:

Code
model <- glm(y ~ x, data = data)
# or
model <- glm(y ~ x, data = data, family = gaussian(link = "identity"))

Linear model with log-transformed outcome

Code
model <- lm(log(y) ~ x, data = data)

Gaussian GLM with log link

Code
model <- glm(y ~ x, data = data, family = gaussian(link = "log"))

Poisson regression with log link

Code
model <- glm(y ~ x, data = data, family = poisson(link = "log"))

Poisson GLM with square root link

Code
model <- glm(y ~ x, data = data, family = poisson(link = "sqrt"))

Binomial model

For binomial regression, the response must be specified as a combination of successes and failures. If the dataset data contains both variables, we write:

Code
model <- glm(cbind(success, failure) ~ x, data = data, family = binomial(link = "logit"))

If only the number of successes is available and the total number of trials is known (denoted Saturation), we use:

Code
model <- glm(cbind(success, Saturation - success) ~ x, data = data, family = binomial(link = "logit"))

1.2 Predictions from GLM

After fitting a model, we can obtain predictions using the predict() function in . In the context of generalized linear models, it is important to distinguish between different types of predictions, as they correspond to different scales of the model.

Let \(\eta_i = \beta_0 + \boldsymbol{\beta}^\top \boldsymbol{x}_i\) denote the linear predictor and let \(\mu_i = \mathbb{E}(Y_i)\) denote the expected response.

  • Linear predictor (type = "link"):
    This returns the values on the scale of the linear predictor, \[ \hat{\eta}_i = \hat{\beta}_0 + \hat{\boldsymbol{\beta}}^\top \boldsymbol{x}_i. \] This corresponds directly to the fitted regression equation before applying the link function.
  • Response scale (type = "response"): This returns predictions on the original data scale, \[ \hat{\mu}_i = g^{-1}(\hat{\eta}_i), \] where \(g^{-1}(\cdot)\) is the inverse link function. This is typically the most interpretable output, since it corresponds to the expected value of \(Y_i\).

Predicted values on the response scale for the specified covariate value x = ...

Code
predict(model, newdata = data.frame(x = ...), type = "response")

Default behaviour

If no type is specified, predict() typically returns values on the response scale for GLMs. For example, in a Poisson regression with log link, we have \[ \hat{\eta}_i = \log \hat{\lambda}_i, \quad \Rightarrow \quad \hat{\lambda}_i = \exp(\hat{\eta}_i), \] so predictions on the response scale correspond to estimated expected counts.

Note

In practice, choosing between these outputs depends on the goal: the linear predictor is useful for understanding the fitted model structure, while the response scale is used for interpretation and prediction of actual outcomes.

2 Analysis of ice cream data

In this section, you will analyze the data describing ice cream sales.

2.1 Exploratory data analysis

CautionExercise 1 (Exploratory analysis)

Work with ice cream sales data available below.

Code
icecream <- data.frame(
    temp = c(11.9, 14.2, 15.2, 16.4, 17.2, 18.1, 18.5, 19.4, 22.1, 22.6, 23.4, 25.1),
    units = c(185, 215, 332, 325, 408, 421, 406, 412, 522, 445, 544, 614)
)
Caution(a)

Plot the data.

Code
icecream %>% 
  ggplot(...)

Caution(b)

Think of a suitable model for modeling the dependence of units sold on temperature.

2.2 Linear model

CautionExercise 2 (Linear model)

Model the ice cream data using classical linear model. Investigate the model fit and decide, whether this approach is a suitable choice for this kind of data.

Code
model_lm <- lm(...)
Caution(a)

Plot the predictions from the linear model.

Caution(b)

Interpret the estimated coefficients.

Code
coef(model_lm)
(Intercept)        temp 
 -159.47415    30.08786 
  • \(\widehat\beta_0\)
  • \(\widehat\beta_1\)
Caution(c)

See a suitable goodness of fit measure.

  • \(R^2 = {}\) 0.9168189
  • \(R^2_{adj} = {}\) 0.9085008
Caution(d)

Evaluate diagnostic plots.

Code
plot(...)

2.3 Log-transformation of outcome

CautionExercise 3

Now create a model that predicts the units sold for any temperature, even outside the range of available data. In particular, be interested in how your models will behave in the more extreme cases when it is freezing outside, say the temperature dropped to \(0^\circ\)C and the prediction for a very hot summer’s day at \(35^\circ\)C.

  • See the intercept from the classic linear model and interpret it.
  • Think of a suitable data transformation. Ideally, ensure that the transformed data has only positive values.
Caution(a)

Fit the model using classic LM. Interpret the intercept again.

Code
model_log <- lm(log(...) ~ ...)
Caution(b)

Fit the model using GLM and logarithmic link. Interpret the coefficients and plot the fitted curve.

Code
mod_gauss_log <- glm(...)

2.4 Poisson regression

CautionExercise 4 (GLM with Poisson distribution)

Although the previous model makes more sense, it appears that is over-predicts sales at lower and higher temperatures. The assumed model distribution also generates real numbers, but sales statistics are units and hence always whole numbers. Although the average number of units sold is likely to be a real number any draw from the model distribution should be a whole number.

  • Fit a GLM using Poisson distribution with logarithmic link.
  • Will the model predict negative sale values? Why or why not?
Code
glm_model_poiss_log <- glm(...)
Caution(a)

Plot predictions from the model.

Caution(b)

How many ice creams do you expect to sell at \(0^\circ\)C? How many ice creams do you expect to sell at \(35^\circ\)C?

Code
predict(...)
Caution(c)

How many more ice creams do you expect to sell with every increasing degree of temperature?

Code
exp(...)
(Intercept)        temp 
  94.049472    1.078526 

2.5 Binomial model

CautionExercise 5 (GLM with Binomial distribution)

Perhaps the exponential growth in the previous model looks a little too optimistic. Presume that the ice cream sales market will be saturated at around 800.

  • Think of a suitable GLM model for sales data now that you have been given a limit that saturates the market.
  • Note that dividing the sales statistics by 800 would give you a proxy for the probability of selling all ice creams.
  • Will a binomial GLM predict negative sale values? Why or why not?
Code
Saturation <- 800

glm_binom_logit = glm(...)

# Interpretation?
exp(coef(glm_binom_logit)[2])
    temp 
1.173148 
Caution(a)

Plot predictions from the model.

Caution(b)

How many ice creams do you expect to sell at \(0^\circ\)C? How many ice creams do you expect to sell at \(35^\circ\)C?

Code
predict(...)
Caution(c)

Plot all the predictions into one plot.

  • Comment on the visual differences in fits.
  • Note down differences between models.