Notes 11

Poisson Counts | Part 1: Introduction

Setup

Loading packages

library(tidyverse)

## Warning: package 'ggplot2' was built under R version 4.3.3

library(GLMsData)
library(broom) # for tidy, augment

## Warning: package 'broom' was built under R version 4.3.3

library(statmod) # for qresid()

## Warning: package 'statmod' was built under R version 4.3.3

Introduction

This unit focuses on modeling counts. Examples from Section 10.1:

number of alpha particles emitted from a source of radiation in a given time
the number of cases of leukemia reported per year in a certain jurisdiction
the number of ﬂaws per meter of electrical cable

Assumptions:

Individual events being counted are independent
No clear upper limit for number of events, or upper limit is very much greater than any of the actual counts.

Poisson Distribution

Most common distribution for count data.

Possible values: 0, 1, 2, 3, … to infinity

Parameter: \(\mu\), the expected count (mean). Possible values for \(\mu\): \(\mu\) > 0 (including decimals)

Noisy miner data example

Section 8.13

data(nminer)
head(nminer, 4)

##   Miners Eucs Area Grazed Shrubs Bulokes Timber Minerab
## 1      0    2   22      0      1     120     16       0
## 2      0   10   11      0      1      67     25       0
## 3      1   16   51      0      1      85     13       3
## 4      1   20   22      0      1      45     12       2

Response: number of noisy miners (Minerab)
Explanatory variable: number of eucalyptus trees (Eucs)

Plot of data. Note use of jittering and transparency (alpha) to show when there are multiple observations with the same data values.

ggplot(
  data = nminer, 
  mapping = aes(x = Eucs, y = Minerab)
) + 
  geom_jitter(width = .3, height = .3, alpha = .3)

Poisson regression model

Random component \(y\sim Poisson(\mu)\), where
- \(y\) are the observed noisy miner counts
- \(\mu\) are the expected counts
\(\log(\mu) = \beta_0 + \beta_1 x\), where
- Which link function is used? log function (Most common for Poisson models)
- \(x\): number of Eucalyptus trees

Purpose of link function

The purpose of the link function is to match the range of

the linear predictor \(\beta_0 + \beta_1 x\). Range: negative infinity to positive infinity
with \(\log(\mu)\). \(\mu > 0\) so \(\log(\mu)\) has range: negative infinity to positive infinity

Plot:

tibble(
  mu = seq(from = .1, to = 10, by = .1), 
  log_mu = log(mu)
) |> 
  ggplot(
    mapping = aes(x = mu, y = log_mu)
  ) + 
  geom_line() + 
  labs(y = "log(mu)")

Fitting the Poisson regression model

We use glm() again! The code is similar to the binomial case, except there is no weight argument needed.

fit <- glm(Minerab ~ Eucs, data = nminer, family=poisson(link="log"))

Maximum likelihood estimates (MLEs) of beta coefficients:

tidy(fit)

## # A tibble: 2 × 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)   -0.876    0.283      -3.10 1.95e- 3
## 2 Eucs           0.114    0.0124      9.17 4.77e-20

Predicted counts are calculated by solving the systematic component \(\log(\mu) = \beta_0 + \beta_1 x\) for \(\mu\).

This gives

\[\hat{\mu} = \exp(\hat{\beta}_0 + \hat{\beta}_1 x)\]

So we can calculate predicted counts directly from the beta estimates and the x values as

exp(tidy(fit)$estimate[1] + tidy(fit)$estimate[2] * nminer$Eucs) |> 
  head(5)

## [1] 0.5229608 1.3016101 2.5792281 4.0690744 3.6307318

or from using the augment() function, which shows them as .fitted

augment(fit, type.predict = "response") |> 
  head(5)

## # A tibble: 5 × 8
##   Minerab  Eucs .fitted .resid   .hat .sigma .cooksd .std.resid
##     <int> <int>   <dbl>  <dbl>  <dbl>  <dbl>   <dbl>      <dbl>
## 1       0     2   0.523 -1.02  0.0354   1.49 0.00994     -1.04 
## 2       0    10   1.30  -1.61  0.0398   1.47 0.0281      -1.65 
## 3       3    16   2.58   0.255 0.0406   1.50 0.00151      0.261
## 4       2    20   4.07  -1.14  0.0491   1.49 0.0285      -1.17 
## 5       8    19   3.63   1.98  0.0454   1.45 0.131        2.02

Plotting predicted counts with observed counts

ggplot(
  data = augment(fit, type.predict = "response"), 
  mapping = aes(x = Eucs, y = Minerab)
) + 
  geom_jitter(width = .3, height = .3, alpha = .5) + 
  geom_line(aes(y = .fitted)) + 
  labs(
    x = "Number of Eucalytpus trees", 
    y = "Number of Noisy Miner birds"
  )

Why the “kinks” in the line on the right? The line is connecting points on a curve that are far apart.

Interpretation of slope

This comes from writing our equation \(\hat{\mu} = \exp(\hat{\beta}_0 + \hat{\beta}_1 x)\) as

\[\hat{\mu} = \exp(\hat{\beta}_0) \exp(\hat{\beta}_1)^x\]

General interpretation of slope: As x increases by 1, the expected number of counts changes by a factor of \(\exp(\beta_1)\).

Examples for `nminer` data

Interpret effect of increasing Eucalyptus trees by one:

As number of Eucalyptus trees increases by 1, the expected count of Noisy Miner birds increases by a factor of

exp(.114)

## [1] 1.120752

How about expressing #1 as a percent?

As number of Eucalyptus trees increases by 1, the expected count of Noisy Miner birds increases by

(exp(.114) - 1) * 100

## [1] 12.07521

Interpret effect of decreasing Eucalyptus trees by two:

As number of Eucalyptus trees decreases by 2, the expected count of Noisy Miner birds decreases by

exp(-2*.114)

## [1] 0.7961243

How about expressing #3 as a percent?

As number of Eucalypltus trees decreases by 2, the expected count of Noisy Miner birds decreases by

(exp(-2*.114) - 1) * 100

## [1] -20.38757

Notes 11

Poisson Counts | Part 1: Introduction

Notes 11

Setup

Introduction

Poisson Distribution

Noisy miner data example

Poisson regression model

Purpose of link function

Fitting the Poisson regression model

Interpretation of slope

Examples for nminer data

Examples for `nminer` data