M7222 Generalized linear models
Seminar 6 – Poisson regression
This seminar focuses on the generalized linear model (GLM) with a Poisson distribution, a fundamental tool for modeling count data. The analysis is conducted using the
icecreamdataset and compares several regression approaches.
We begin with the classical linear model and its variant with a logarithmic transformation, highlighting their limitations when applied to count data. Next, we introduce Poisson regression models with different link functions, which provide a more appropriate framework for such data. Finally, we consider a binomial model as an alternative approach. All models are implemented using the
glmfunction in.
First, load packages.
Theoretical framework
We start with Poisson distribution properties and then shift to the GLM with Poisson distribution.
Poisson Distribution
The Poisson distribution has probability mass function
\[ f(y; \lambda) = \frac{\lambda^y}{y!} e^{-\lambda} = \exp \left\{ -\lambda + y \log \lambda - \log y! \right\}. \]
It belongs to the exponential family, because \(f(y; \lambda)\) can be written in the form \[ f(y; \lambda) = \exp \left\{ \frac{\theta y - b(\theta)}{\varphi} + c(y, \varphi) \right\}, \] where \(\theta = \log (\lambda), \quad b(\theta) = e^{\theta}, \quad \varphi = 1\).
From standard properties of exponential family distributions, we obtain \[ \mathbb{E}Y = b'(\theta) = e^{\theta} = \lambda, \] \[ V(\mathbb{E}Y) = b'' \left( b'^{-1}(\mathbb{E}Y) \right) = \lambda, \] \[ \text{Var}(Y) = \varphi \cdot V(\mathbb{E}Y) = \lambda. \]
Canonical link for Poisson GLM
The canonical link function is given by \[ g(\mu) = \left( b' \right)^{-1}(\mu), \] and since \(b'(\theta) = e^{\theta} = \mu\), we obtain \[ g(\mu) = \log(\mu). \]
Although linear models may provide a reasonable fit, they are not well suited for count data. In particular, they may produce non-integer or even negative predictions and can overestimate values at extreme covariate levels. In contrast, the Poisson model naturally respects the discrete and non-negative nature of the data.
Poisson Regression Model
We consider regression models for count data, with a particular focus on generalized linear models (GLMs). Let \(Y_i\) denote the outcome for the \(i\)-th observation with associated explanatory variables \(\boldsymbol{x}_i\).
A natural starting point for modeling count data is the Poisson regression model. We assume \[ Y_i \sim \text{Poisson}(\lambda_i), \] where \(\lambda_i = \mathbb{E}Y_i\) is the expected number of events. Within the GLM framework, the model is specified as
\[ g(\lambda_i) = \beta_0 + \boldsymbol\beta^\top \boldsymbol x_i, \]
where \(g(\cdot)\) is a link function, \(\beta_0\) is the intercept, and \(\boldsymbol\beta\) is the vector of regression parameters describing the effect of covariates \(\boldsymbol x_i\).
For simplicity, we consider the case of a single covariate \(x_i\) in the remainder of the theoretical framework.
For the canonical (\(\log\)) link, this becomes \[ \log \lambda_i = \beta_0 + \beta_1 x_i \quad \Rightarrow \quad \lambda_i = \mathrm e^{\beta_0 + \beta_1 x_i}. \]
Two common alternatives are the classical linear model and a linear model with log-transformed response: \[ \text{GLM + log-link:} \quad \log(\mathbb E Y_i) = \beta_0 + \beta_1 x_i \implies \mathbb E Y_i = \mathrm e^{\beta_0 + \beta_1 x_i}, \] \[ \text{LM + log-transformation:} \quad \mathbb E \log(Y_i) = \beta_0 + \beta_1 x_i \implies \mathbb E Y_i \neq \mathrm e^{\beta_0 + \beta_1 x_i}. \]
The latter does not directly model \(\mathbb{E}Y_i\), which complicates interpretation.
Poisson Model with Log Link
We assume \[ Y_i \sim \text{Poisson}(\lambda_i), \quad \log \lambda_i = \beta_0 + \beta_1 x_i, \] which implies \[ \lambda_i = \mathrm e^{\beta_0 + \beta_1 x_i}. \]
\[ x_i = 0 \implies \lambda_i = \mathbb E Y_i = \mathrm e^{\beta_0}, \]
\[ x_i \uparrow 1 \implies \lambda_i \text{ is multiplied by } \mathrm e^{\beta_1}. \]
More formally, if we assume two individuals with covariate values \(x\) and \(x+1\), we get \[ \begin{align} \log(\mu_1) &= \beta_0 + \beta_1 (x + 1), \\ \log(\mu_2) &= \beta_0 + \beta_1 x \end{align} \] and then \[ \log(\mu_1) - \log(\mu_2) = \log \left( \frac{\mu_1}{\mu_2} \right) = \beta_1 \quad \Rightarrow \quad \mu_1 = e^{\beta_1} \cdot \mu_2, \] which can be rewritten as
\[ \mathbb{E}Y_1 = e^{\beta_1} \cdot \mathbb{E}Y_2. \]
Alternative Link: Square Root
An alternative specification uses the square root link: \[ \sqrt \lambda_i = \beta_0 + \beta_1 x_i \implies \lambda_i = \{\beta_0 + \beta_1 x_i\}^2. \] In this case, \[ x_i = 0 \implies \lambda_i = \mathbb E Y_i = {\beta_0}^2. \]
Binomial Model with Logit Link
As an alternative perspective, we may model the probability of success for the \(i\)-th observation. Let \[ Y_i \sim \text{Binomial}(n, p_i), \] where \(p_i\) is the probability of success (selling an ice cream). The model is
\[ \mathrm{logit}(p_i) = \log\left(\frac{p_i}{1-p_i}\right) = \beta_0 + \beta_1 x_i. \]
In practice, the response is specified as two columns:
- the number of successes
- and failures.
Interpretation of Coefficients
Interpretation of \(\beta_0\)
For \(x_i = 0\), we obtain \[ \log\left(\frac{\widehat p_i}{1-\widehat p_i}\right) = \widehat \beta_0 \quad \Rightarrow \quad \widehat p_i = \frac{\exp(\widehat \beta_0)}{1 + \exp(\widehat \beta_0)} = \frac{1}{1 + \exp(-\widehat \beta_0)} = \text{plogis}(\widehat \beta_0). \]
Interpretation of \(\beta_1\)
For the slope parameter, consider \[ \begin{align} \log\left(\frac{p_1}{1 - p_1}\right) &= \beta_0 + \beta_1 (x + 1),\\ \log\left(\frac{p_2}{1 - p_2}\right) &= \beta_0 + \beta_1 x. \end{align} \]
Subtracting gives \[ \log\left(\frac{p_1}{1 - p_1}\right) - \log\left(\frac{p_2}{1 - p_2}\right) = \beta_1, \] which implies
\[ \text{OR} = \frac{\frac{p_1}{1 - p_1}}{\frac{p_2}{1 - p_2}} = e^{\beta_1}. \]
Thus, the odds of selling an ice cream increase by a factor of \(e^{\beta_1}\) for each one-degree increase in temperature.
1 Implementation in
1.1 Fitting the GLM
To fit the regression models introduced above, we use glm() function. Assume we have data frame called data with two columns, outcome y and explanatory variable x.
Classical linear model
Code
model <- lm(y ~ x, data = data)Alternatively, the same model can be expressed as a Gaussian GLM:
Linear model with log-transformed outcome
Gaussian GLM with log link
Poisson regression with log link
Poisson GLM with square root link
Binomial model
For binomial regression, the response must be specified as a combination of successes and failures. If the dataset data contains both variables, we write:
If only the number of successes is available and the total number of trials is known (denoted Saturation), we use:
1.2 Predictions from GLM
After fitting a model, we can obtain predictions using the predict() function in . In the context of generalized linear models, it is important to distinguish between different types of predictions, as they correspond to different scales of the model.
Let \(\eta_i = \beta_0 + \boldsymbol{\beta}^\top \boldsymbol{x}_i\) denote the linear predictor and let \(\mu_i = \mathbb{E}(Y_i)\) denote the expected response.
-
Linear predictor (
type = "link"):
This returns the values on the scale of the linear predictor, \[ \hat{\eta}_i = \hat{\beta}_0 + \hat{\boldsymbol{\beta}}^\top \boldsymbol{x}_i. \] This corresponds directly to the fitted regression equation before applying the link function. -
Response scale (
type = "response"): This returns predictions on the original data scale, \[ \hat{\mu}_i = g^{-1}(\hat{\eta}_i), \] where \(g^{-1}(\cdot)\) is the inverse link function. This is typically the most interpretable output, since it corresponds to the expected value of \(Y_i\).
Predicted values on the response scale for the specified covariate value
x = ...
Code
predict(model, newdata = data.frame(x = ...), type = "response")Default behaviour
If no type is specified, predict() typically returns values on the response scale for GLMs. For example, in a Poisson regression with log link, we have \[
\hat{\eta}_i = \log \hat{\lambda}_i, \quad \Rightarrow \quad \hat{\lambda}_i = \exp(\hat{\eta}_i),
\] so predictions on the response scale correspond to estimated expected counts.
In practice, choosing between these outputs depends on the goal: the linear predictor is useful for understanding the fitted model structure, while the response scale is used for interpretation and prediction of actual outcomes.
2 Analysis of ice cream data
In this section, you will analyze the data describing ice cream sales.
2.1 Exploratory data analysis
Code
icecream <- data.frame(
temp = c(11.9, 14.2, 15.2, 16.4, 17.2, 18.1, 18.5, 19.4, 22.1, 22.6, 23.4, 25.1),
units = c(185, 215, 332, 325, 408, 421, 406, 412, 522, 445, 544, 614)
)2.2 Linear model
Code
model_lm <- lm(...)Code
coef(model_lm)(Intercept) temp
-159.47415 30.08786
- \(\widehat\beta_0\) …
- \(\widehat\beta_1\) …
- \(R^2 = {}\) 0.9168189
- \(R^2_{adj} = {}\) 0.9085008
Code
plot(...)2.3 Log-transformation of outcome
Code
mod_gauss_log <- glm(...)2.4 Poisson regression
Code
glm_model_poiss_log <- glm(...)Code
predict(...)Code
exp(...)(Intercept) temp
94.049472 1.078526
2.5 Binomial model
temp
1.173148
Code
predict(...)