Notes 11

Poisson Counts | Part 1: Introduction

Setup

Loading packages

library(tidyverse)
## Warning: package 'ggplot2' was built under R version 4.3.3
library(GLMsData)
library(broom) # for tidy, augment
## Warning: package 'broom' was built under R version 4.3.3
library(statmod) # for qresid()
## Warning: package 'statmod' was built under R version 4.3.3

Introduction

This unit focuses on modeling counts. Examples from Section 10.1:

  • number of alpha particles emitted from a source of radiation in a given time

  • the number of cases of leukemia reported per year in a certain jurisdiction

  • the number of flaws per meter of electrical cable

Assumptions:

  • Individual events being counted are independent

  • No clear upper limit for number of events, or upper limit is very much greater than any of the actual counts.

Poisson Distribution

Most common distribution for count data.

Possible values: 0, 1, 2, 3, … to infinity

Parameter: \(\mu\), the expected count (mean). Possible values for \(\mu\): \(\mu\) > 0 (including decimals)

Noisy miner data example

Section 8.13

data(nminer)
head(nminer, 4)
##   Miners Eucs Area Grazed Shrubs Bulokes Timber Minerab
## 1      0    2   22      0      1     120     16       0
## 2      0   10   11      0      1      67     25       0
## 3      1   16   51      0      1      85     13       3
## 4      1   20   22      0      1      45     12       2
  • Response: number of noisy miners (Minerab)

  • Explanatory variable: number of eucalyptus trees (Eucs)

Plot of data. Note use of jittering and transparency (alpha) to show when there are multiple observations with the same data values.

ggplot(
  data = nminer, 
  mapping = aes(x = Eucs, y = Minerab)
) + 
  geom_jitter(width = .3, height = .3, alpha = .3)

Poisson regression model

  1. Random component \(y\sim Poisson(\mu)\), where

    • \(y\) are the observed noisy miner counts

    • \(\mu\) are the expected counts

  2. \(\log(\mu) = \beta_0 + \beta_1 x\), where

    • Which link function is used? log function (Most common for Poisson models)

    • \(x\): number of Eucalyptus trees

Fitting the Poisson regression model

We use glm() again! The code is similar to the binomial case, except there is no weight argument needed.

fit <- glm(Minerab ~ Eucs, data = nminer, family=poisson(link="log"))

Maximum likelihood estimates (MLEs) of beta coefficients:

tidy(fit)
## # A tibble: 2 × 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)   -0.876    0.283      -3.10 1.95e- 3
## 2 Eucs           0.114    0.0124      9.17 4.77e-20

Predicted counts are calculated by solving the systematic component \(\log(\mu) = \beta_0 + \beta_1 x\) for \(\mu\).

This gives

\[\hat{\mu} = \exp(\hat{\beta}_0 + \hat{\beta}_1 x)\]

So we can calculate predicted counts directly from the beta estimates and the x values as

exp(tidy(fit)$estimate[1] + tidy(fit)$estimate[2] * nminer$Eucs) |> 
  head(5)
## [1] 0.5229608 1.3016101 2.5792281 4.0690744 3.6307318

or from using the augment() function, which shows them as .fitted

augment(fit, type.predict = "response") |> 
  head(5)
## # A tibble: 5 × 8
##   Minerab  Eucs .fitted .resid   .hat .sigma .cooksd .std.resid
##     <int> <int>   <dbl>  <dbl>  <dbl>  <dbl>   <dbl>      <dbl>
## 1       0     2   0.523 -1.02  0.0354   1.49 0.00994     -1.04 
## 2       0    10   1.30  -1.61  0.0398   1.47 0.0281      -1.65 
## 3       3    16   2.58   0.255 0.0406   1.50 0.00151      0.261
## 4       2    20   4.07  -1.14  0.0491   1.49 0.0285      -1.17 
## 5       8    19   3.63   1.98  0.0454   1.45 0.131        2.02

Plotting predicted counts with observed counts

ggplot(
  data = augment(fit, type.predict = "response"), 
  mapping = aes(x = Eucs, y = Minerab)
) + 
  geom_jitter(width = .3, height = .3, alpha = .5) + 
  geom_line(aes(y = .fitted)) + 
  labs(
    x = "Number of Eucalytpus trees", 
    y = "Number of Noisy Miner birds"
  )

Why the “kinks” in the line on the right? The line is connecting points on a curve that are far apart.

Interpretation of slope

This comes from writing our equation \(\hat{\mu} = \exp(\hat{\beta}_0 + \hat{\beta}_1 x)\) as

\[\hat{\mu} = \exp(\hat{\beta}_0) \exp(\hat{\beta}_1)^x\]

General interpretation of slope: As x increases by 1, the expected number of counts changes by a factor of \(\exp(\beta_1)\).

Examples for nminer data

  1. Interpret effect of increasing Eucalyptus trees by one:

As number of Eucalyptus trees increases by 1, the expected count of Noisy Miner birds increases by a factor of

exp(.114)
## [1] 1.120752
  1. How about expressing #1 as a percent?

As number of Eucalyptus trees increases by 1, the expected count of Noisy Miner birds increases by

(exp(.114) - 1) * 100
## [1] 12.07521
  1. Interpret effect of decreasing Eucalyptus trees by two:

As number of Eucalyptus trees decreases by 2, the expected count of Noisy Miner birds decreases by

exp(-2*.114)
## [1] 0.7961243
  1. How about expressing #3 as a percent?

As number of Eucalypltus trees decreases by 2, the expected count of Noisy Miner birds decreases by

(exp(-2*.114) - 1) * 100
## [1] -20.38757