M7222 Generalized linear models

Seminar 6 – Poisson regression

This seminar focuses on the generalized linear model (GLM) with a Poisson distribution, a fundamental tool for modeling count data. The analysis is conducted using the icecream dataset and compares several regression approaches.

We begin with the classical linear model and its variant with a logarithmic transformation, highlighting their limitations when applied to count data. Next, we introduce Poisson regression models with different link functions, which provide a more appropriate framework for such data. Finally, we consider a binomial model as an alternative approach. All models are implemented using the glm function in .

First, load packages.

Code

Load libraries

library(dplyr)
library(tidyr)

## ggplot libraries
library(ggplot2)

## colors
col1 <- 'deepskyblue'
col2 <- 'tomato'

Theoretical framework

We start with Poisson distribution properties and then shift to the GLM with Poisson distribution.

Poisson Distribution

The Poisson distribution has probability mass function

\[ f(y; \lambda) = \frac{\lambda^y}{y!} e^{-\lambda} = \exp \left\{ -\lambda + y \log \lambda - \log y! \right\}. \]

It belongs to the exponential family, because \(f(y; \lambda)\) can be written in the form \[ f(y; \lambda) = \exp \left\{ \frac{\theta y - b(\theta)}{\varphi} + c(y, \varphi) \right\}, \] where \(\theta = \log (\lambda), \quad b(\theta) = e^{\theta}, \quad \varphi = 1\).

From standard properties of exponential family distributions, we obtain \[ \mathbb{E}Y = b'(\theta) = e^{\theta} = \lambda, \] \[ V(\mathbb{E}Y) = b'' \left( b'^{-1}(\mathbb{E}Y) \right) = \lambda, \] \[ \text{Var}(Y) = \varphi \cdot V(\mathbb{E}Y) = \lambda. \]

Canonical link for Poisson GLM

The canonical link function is given by \[ g(\mu) = \left( b' \right)^{-1}(\mu), \] and since \(b'(\theta) = e^{\theta} = \mu\), we obtain \[ g(\mu) = \log(\mu). \]

Motivation for Poisson Regression

Although linear models may provide a reasonable fit, they are not well suited for count data. In particular, they may produce non-integer or even negative predictions and can overestimate values at extreme covariate levels. In contrast, the Poisson model naturally respects the discrete and non-negative nature of the data.

Poisson Regression Model

We consider regression models for count data, with a particular focus on generalized linear models (GLMs). Let \(Y_i\) denote the outcome for the \(i\)-th observation with associated explanatory variables \(\boldsymbol{x}_i\).

A natural starting point for modeling count data is the Poisson regression model. We assume \[ Y_i \sim \text{Poisson}(\lambda_i), \] where \(\lambda_i = \mathbb{E}Y_i\) is the expected number of events. Within the GLM framework, the model is specified as

\[ g(\lambda_i) = \beta_0 + \boldsymbol\beta^\top \boldsymbol x_i, \]

where \(g(\cdot)\) is a link function, \(\beta_0\) is the intercept, and \(\boldsymbol\beta\) is the vector of regression parameters describing the effect of covariates \(\boldsymbol x_i\).

For simplicity, we consider the case of a single covariate \(x_i\) in the remainder of the theoretical framework.

For the canonical (\(\log\)) link, this becomes \[ \log \lambda_i = \beta_0 + \beta_1 x_i \quad \Rightarrow \quad \lambda_i = \mathrm e^{\beta_0 + \beta_1 x_i}. \]

Comparison with Linear Models

Two common alternatives are the classical linear model and a linear model with log-transformed response: \[ \text{GLM + log-link:} \quad \log(\mathbb E Y_i) = \beta_0 + \beta_1 x_i \implies \mathbb E Y_i = \mathrm e^{\beta_0 + \beta_1 x_i}, \] \[ \text{LM + log-transformation:} \quad \mathbb E \log(Y_i) = \beta_0 + \beta_1 x_i \implies \mathbb E Y_i \neq \mathrm e^{\beta_0 + \beta_1 x_i}. \]

The latter does not directly model \(\mathbb{E}Y_i\), which complicates interpretation.

Poisson Model with Log Link

We assume \[ Y_i \sim \text{Poisson}(\lambda_i), \quad \log \lambda_i = \beta_0 + \beta_1 x_i, \] which implies \[ \lambda_i = \mathrm e^{\beta_0 + \beta_1 x_i}. \]

Interpretation of parameters:

\[ x_i = 0 \implies \lambda_i = \mathbb E Y_i = \mathrm e^{\beta_0}, \]

\[ x_i \uparrow 1 \implies \lambda_i \text{ is multiplied by } \mathrm e^{\beta_1}. \]

More formally, if we assume two individuals with covariate values \(x\) and \(x+1\), we get \[ \begin{align} \log(\mu_1) &= \beta_0 + \beta_1 (x + 1), \\ \log(\mu_2) &= \beta_0 + \beta_1 x \end{align} \] and then \[ \log(\mu_1) - \log(\mu_2) = \log \left( \frac{\mu_1}{\mu_2} \right) = \beta_1 \quad \Rightarrow \quad \mu_1 = e^{\beta_1} \cdot \mu_2, \] which can be rewritten as

\[ \mathbb{E}Y_1 = e^{\beta_1} \cdot \mathbb{E}Y_2. \]

Alternative Link: Square Root

An alternative specification uses the square root link: \[ \sqrt \lambda_i = \beta_0 + \beta_1 x_i \implies \lambda_i = \{\beta_0 + \beta_1 x_i\}^2. \] In this case, \[ x_i = 0 \implies \lambda_i = \mathbb E Y_i = {\beta_0}^2. \]

Binomial Model with Logit Link

As an alternative perspective, we may model the probability of success for the \(i\)-th observation. Let \[ Y_i \sim \text{Binomial}(n, p_i), \] where \(p_i\) is the probability of success (selling an ice cream). The model is

\[ \mathrm{logit}(p_i) = \log\left(\frac{p_i}{1-p_i}\right) = \beta_0 + \beta_1 x_i. \]

In practice, the response is specified as two columns:

the number of successes
and failures.

Interpretation of Coefficients

Interpretation of \(\beta_0\)

For \(x_i = 0\), we obtain \[ \log\left(\frac{\widehat p_i}{1-\widehat p_i}\right) = \widehat \beta_0 \quad \Rightarrow \quad \widehat p_i = \frac{\exp(\widehat \beta_0)}{1 + \exp(\widehat \beta_0)} = \frac{1}{1 + \exp(-\widehat \beta_0)} = \text{plogis}(\widehat \beta_0). \]

Graph of plogis function

Interpretation of \(\beta_1\)

For the slope parameter, consider \[ \begin{align} \log\left(\frac{p_1}{1 - p_1}\right) &= \beta_0 + \beta_1 (x + 1),\\ \log\left(\frac{p_2}{1 - p_2}\right) &= \beta_0 + \beta_1 x. \end{align} \]

Subtracting gives \[ \log\left(\frac{p_1}{1 - p_1}\right) - \log\left(\frac{p_2}{1 - p_2}\right) = \beta_1, \] which implies

\[ \text{OR} = \frac{\frac{p_1}{1 - p_1}}{\frac{p_2}{1 - p_2}} = e^{\beta_1}. \]

Thus, the odds of selling an ice cream increase by a factor of \(e^{\beta_1}\) for each one-degree increase in temperature.

1 Implementation in

1.1 Fitting the GLM

To fit the regression models introduced above, we use glm() function. Assume we have data frame called data with two columns, outcome y and explanatory variable x.

Classical linear model

Code

model <- lm(y ~ x, data = data)

Alternatively, the same model can be expressed as a Gaussian GLM:

Code

model <- glm(y ~ x, data = data)
# or
model <- glm(y ~ x, data = data, family = gaussian(link = "identity"))

Linear model with log-transformed outcome

Code

model <- lm(log(y) ~ x, data = data)

Gaussian GLM with log link

Code

model <- glm(y ~ x, data = data, family = gaussian(link = "log"))

Poisson regression with log link

Code

model <- glm(y ~ x, data = data, family = poisson(link = "log"))

Poisson GLM with square root link

Code

model <- glm(y ~ x, data = data, family = poisson(link = "sqrt"))

Binomial model

For binomial regression, the response must be specified as a combination of successes and failures. If the dataset data contains both variables, we write:

Code

model <- glm(cbind(success, failure) ~ x, data = data, family = binomial(link = "logit"))

If only the number of successes is available and the total number of trials is known (denoted Saturation), we use:

Code

model <- glm(cbind(success, Saturation - success) ~ x, data = data, family = binomial(link = "logit"))

1.2 Predictions from GLM

After fitting a model, we can obtain predictions using the predict() function in . In the context of generalized linear models, it is important to distinguish between different types of predictions, as they correspond to different scales of the model.

Let \(\eta_i = \beta_0 + \boldsymbol{\beta}^\top \boldsymbol{x}_i\) denote the linear predictor and let \(\mu_i = \mathbb{E}(Y_i)\) denote the expected response.

Linear predictor (type = "link"):
This returns the values on the scale of the linear predictor, \[ \hat{\eta}_i = \hat{\beta}_0 + \hat{\boldsymbol{\beta}}^\top \boldsymbol{x}_i. \] This corresponds directly to the fitted regression equation before applying the link function.
Response scale (type = "response"): This returns predictions on the original data scale, \[ \hat{\mu}_i = g^{-1}(\hat{\eta}_i), \] where \(g^{-1}(\cdot)\) is the inverse link function. This is typically the most interpretable output, since it corresponds to the expected value of \(Y_i\).

Predicted values on the response scale for the specified covariate value x = ...

Code

predict(model, newdata = data.frame(x = ...), type = "response")

Default behaviour

If no type is specified, predict() typically returns values on the response scale for GLMs. For example, in a Poisson regression with log link, we have \[ \hat{\eta}_i = \log \hat{\lambda}_i, \quad \Rightarrow \quad \hat{\lambda}_i = \exp(\hat{\eta}_i), \] so predictions on the response scale correspond to estimated expected counts.

Note

In practice, choosing between these outputs depends on the goal: the linear predictor is useful for understanding the fitted model structure, while the response scale is used for interpretation and prediction of actual outcomes.

2 Analysis of ice cream data

In this section, you will analyze the data describing ice cream sales.

2.1 Exploratory data analysis

Exercise 1 (Exploratory analysis)

Work with ice cream sales data available below.

Code

icecream <- data.frame(
    temp = c(11.9, 14.2, 15.2, 16.4, 17.2, 18.1, 18.5, 19.4, 22.1, 22.6, 23.4, 25.1),
    units = c(185, 215, 332, 325, 408, 421, 406, 412, 522, 445, 544, 614)
)

(a)

Plot the data.

Code

icecream %>% 
  ggplot(...)

(b)

Think of a suitable model for modeling the dependence of units sold on temperature.

2.2 Linear model

Exercise 2 (Linear model)

Model the ice cream data using classical linear model. Investigate the model fit and decide, whether this approach is a suitable choice for this kind of data.

Code

model_lm <- lm(...)

(a)

Plot the predictions from the linear model.

(b)

Interpret the estimated coefficients.

Code

coef(model_lm)

(Intercept)        temp 
 -159.47415    30.08786

\(\widehat\beta_0\) …
\(\widehat\beta_1\) …

(c)

See a suitable goodness of fit measure.

\(R^2 = {}\) 0.9168189
\(R^2_{adj} = {}\) 0.9085008

(d)

Evaluate diagnostic plots.

Code

plot(...)

2.3 Log-transformation of outcome

Exercise 3

Now create a model that predicts the units sold for any temperature, even outside the range of available data. In particular, be interested in how your models will behave in the more extreme cases when it is freezing outside, say the temperature dropped to \(0^\circ\)C and the prediction for a very hot summer’s day at \(35^\circ\)C.

See the intercept from the classic linear model and interpret it.
Think of a suitable data transformation. Ideally, ensure that the transformed data has only positive values.

(a)

Fit the model using classic LM. Interpret the intercept again.

Code

model_log <- lm(log(...) ~ ...)

(b)

Fit the model using GLM and logarithmic link. Interpret the coefficients and plot the fitted curve.

Code

mod_gauss_log <- glm(...)

2.4 Poisson regression

Exercise 4 (GLM with Poisson distribution)

Although the previous model makes more sense, it appears that is over-predicts sales at lower and higher temperatures. The assumed model distribution also generates real numbers, but sales statistics are units and hence always whole numbers. Although the average number of units sold is likely to be a real number any draw from the model distribution should be a whole number.

Fit a GLM using Poisson distribution with logarithmic link.
Will the model predict negative sale values? Why or why not?

Code

glm_model_poiss_log <- glm(...)

(a)

Plot predictions from the model.

(b)

How many ice creams do you expect to sell at \(0^\circ\)C? How many ice creams do you expect to sell at \(35^\circ\)C?

Code

predict(...)

(c)

How many more ice creams do you expect to sell with every increasing degree of temperature?

Code

exp(...)

(Intercept)        temp 
  94.049472    1.078526

2.5 Binomial model

Exercise 5 (GLM with Binomial distribution)

Perhaps the exponential growth in the previous model looks a little too optimistic. Presume that the ice cream sales market will be saturated at around 800.

Think of a suitable GLM model for sales data now that you have been given a limit that saturates the market.
Note that dividing the sales statistics by 800 would give you a proxy for the probability of selling all ice creams.
Will a binomial GLM predict negative sale values? Why or why not?

Code

Saturation <- 800

glm_binom_logit = glm(...)

# Interpretation?
exp(coef(glm_binom_logit)[2])

    temp 
1.173148

(a)

Plot predictions from the model.

(b)

How many ice creams do you expect to sell at \(0^\circ\)C? How many ice creams do you expect to sell at \(35^\circ\)C?

Code

predict(...)

(c)

Plot all the predictions into one plot.

Comment on the visual differences in fits.
Note down differences between models.