Notes 11
Poisson Counts | Part 1: Introduction
Setup
Loading packages
## Warning: package 'ggplot2' was built under R version 4.3.3
## Warning: package 'broom' was built under R version 4.3.3
## Warning: package 'statmod' was built under R version 4.3.3
Introduction
This unit focuses on modeling counts. Examples from Section 10.1:
number of alpha particles emitted from a source of radiation in a given time
the number of cases of leukemia reported per year in a certain jurisdiction
the number of flaws per meter of electrical cable
Assumptions:
Individual events being counted are independent
No clear upper limit for number of events, or upper limit is very much greater than any of the actual counts.
Poisson Distribution
Most common distribution for count data.
Possible values: 0, 1, 2, 3, … to infinity
Parameter: \(\mu\), the expected count (mean). Possible values for \(\mu\): \(\mu\) > 0 (including decimals)
Noisy miner data example
Section 8.13
## Miners Eucs Area Grazed Shrubs Bulokes Timber Minerab
## 1 0 2 22 0 1 120 16 0
## 2 0 10 11 0 1 67 25 0
## 3 1 16 51 0 1 85 13 3
## 4 1 20 22 0 1 45 12 2
Response: number of noisy miners (
Minerab
)Explanatory variable: number of eucalyptus trees (
Eucs
)
Plot of data. Note use of jittering and transparency
(alpha
) to show when there are multiple observations with
the same data values.
ggplot(
data = nminer,
mapping = aes(x = Eucs, y = Minerab)
) +
geom_jitter(width = .3, height = .3, alpha = .3)
Poisson regression model
Random component \(y\sim Poisson(\mu)\), where
\(y\) are the observed noisy miner counts
\(\mu\) are the expected counts
\(\log(\mu) = \beta_0 + \beta_1 x\), where
Which link function is used? log function (Most common for Poisson models)
\(x\): number of Eucalyptus trees
Purpose of link function
The purpose of the link function is to match the range of
the linear predictor \(\beta_0 + \beta_1 x\). Range: negative infinity to positive infinity
with \(\log(\mu)\). \(\mu > 0\) so \(\log(\mu)\) has range: negative infinity to positive infinity
Plot:
tibble(
mu = seq(from = .1, to = 10, by = .1),
log_mu = log(mu)
) |>
ggplot(
mapping = aes(x = mu, y = log_mu)
) +
geom_line() +
labs(y = "log(mu)")
Fitting the Poisson regression model
We use glm()
again! The code is similar to the binomial
case, except there is no weight
argument needed.
Maximum likelihood estimates (MLEs) of beta coefficients:
## # A tibble: 2 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -0.876 0.283 -3.10 1.95e- 3
## 2 Eucs 0.114 0.0124 9.17 4.77e-20
Predicted counts are calculated by solving the systematic component \(\log(\mu) = \beta_0 + \beta_1 x\) for \(\mu\).
This gives
\[\hat{\mu} = \exp(\hat{\beta}_0 + \hat{\beta}_1 x)\]
So we can calculate predicted counts directly from the beta estimates and the x values as
## [1] 0.5229608 1.3016101 2.5792281 4.0690744 3.6307318
or from using the augment()
function, which shows them
as .fitted
## # A tibble: 5 × 8
## Minerab Eucs .fitted .resid .hat .sigma .cooksd .std.resid
## <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 2 0.523 -1.02 0.0354 1.49 0.00994 -1.04
## 2 0 10 1.30 -1.61 0.0398 1.47 0.0281 -1.65
## 3 3 16 2.58 0.255 0.0406 1.50 0.00151 0.261
## 4 2 20 4.07 -1.14 0.0491 1.49 0.0285 -1.17
## 5 8 19 3.63 1.98 0.0454 1.45 0.131 2.02
Plotting predicted counts with observed counts
ggplot(
data = augment(fit, type.predict = "response"),
mapping = aes(x = Eucs, y = Minerab)
) +
geom_jitter(width = .3, height = .3, alpha = .5) +
geom_line(aes(y = .fitted)) +
labs(
x = "Number of Eucalytpus trees",
y = "Number of Noisy Miner birds"
)
Why the “kinks” in the line on the right? The line is connecting points on a curve that are far apart.
Interpretation of slope
This comes from writing our equation \(\hat{\mu} = \exp(\hat{\beta}_0 + \hat{\beta}_1 x)\) as
\[\hat{\mu} = \exp(\hat{\beta}_0) \exp(\hat{\beta}_1)^x\]
General interpretation of slope: As x increases by 1, the expected number of counts changes by a factor of \(\exp(\beta_1)\).
Examples for nminer
data
- Interpret effect of increasing Eucalyptus trees by one:
As number of Eucalyptus trees increases by 1, the expected count of Noisy Miner birds increases by a factor of
## [1] 1.120752
- How about expressing #1 as a percent?
As number of Eucalyptus trees increases by 1, the expected count of Noisy Miner birds increases by
## [1] 12.07521
- Interpret effect of decreasing Eucalyptus trees by two:
As number of Eucalyptus trees decreases by 2, the expected count of Noisy Miner birds decreases by
## [1] 0.7961243
- How about expressing #3 as a percent?
As number of Eucalypltus trees decreases by 2, the expected count of Noisy Miner birds decreases by
## [1] -20.38757