Notes 14

Poisson Counts | Part 4: Standard Errors, Hypothesis Tests for Beta Coefficients

Setup

Loading packages

library(tidyverse)
## Warning: package 'ggplot2' was built under R version 4.3.3
library(GLMsData)
library(broom) # for tidy, augment, glance
## Warning: package 'broom' was built under R version 4.3.3
library(statmod) # for qresid()
## Warning: package 'statmod' was built under R version 4.3.3

Noisy miner data context

data(nminer)
head(nminer)
##   Miners Eucs Area Grazed Shrubs Bulokes Timber Minerab
## 1      0    2   22      0      1     120     16       0
## 2      0   10   11      0      1      67     25       0
## 3      1   16   51      0      1      85     13       3
## 4      1   20   22      0      1      45     12       2
## 5      1   19    4      0      1     160     14       8
## 6      1   18   61      0      1      75      6       1

Revisit Poisson model for y = Minerab in terms of x = Eucs with log link:

  • \(Minerab \sim Poisson(\mu)\), where

  • \(\log(\mu) = \beta_0 + \beta_1 Eucs\)

Fit the model

fit <- glm(Minerab ~ Eucs, family = poisson(link = "log"), data = nminer)
fit <- glm(Minerab ~ Eucs, family = poisson(link = "log"), data = nminer)

Obtain maximum likelihoood estimates (MLEs) for beta coefficients

tidy(fit)
## # A tibble: 2 × 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)   -0.876    0.283      -3.10 1.95e- 3
## 2 Eucs           0.114    0.0124      9.17 4.77e-20

So far, we’ve only been focusing on the estimate column of this table. In this notes, we focus on the other columns:

  • std.error (standard error)

  • statistic (test statistic)

  • p.value

We’ll focus on the row for Eucs, which gives the estimate, standard error, test statistic, and p-value for \(\beta_1\)

std.error (Standard error)

The standard error of an estimate, say \(\hat{\beta}_1\), is an estimate of the standard deviation of the sampling distribution of \(\hat{\beta}_1\).

Sampling distribution of \(\hat{\beta}_1\):

  • we consider taking repeated samples from the population and fitting the model for each sample.

  • The distribution of the \(\hat{\beta}_1\) values we get for different samples is the sampling distribution of \(\hat{\beta}_1\).

Bootstrap method

“Taking repeated samples from the population” is a theoretical concept and hard to grasp since we don’t know the population. But a good substitute for this in practice is taking repeated samples with replacement from the sample. This is called the bootstrap method.

Taking samples with and without replacement. Consider taking a sample of 10 balls from a bingo ball cage out of 100 ping pong balls that are numbered from 1 to 100. To take the sample, select one ball at a time.

  • With replacement: You put the ball back in the cage before you select the next ball. (In other words, the ball can be selected multiple times.)

  • Without replacement: Once you select a ball, don’t put it back in the cage. (The ball can only be selected at most once.)

Let’s take 100 bootstrap samples from the nminer dataset. (Usually you’d take thousands of samples but let’s keep the computation time low.)

The workhorse here is the sample_n() function: we are taking samples from nminer, the same size of the original dataset, with replacement.

set.seed(24601)
N <- 100
boot_samples <- tibble()
for(i in 1:N){
  boot_samples <- bind_rows(
    boot_samples, 
    tibble(
      n = i, 
      sample_n(nminer, size = nrow(nminer), replace = TRUE)
    )
  ) |> 
    select(n, Minerab, Eucs)
}

Plots of the first 6 bootstrap samples, along with the fitted mean lines. Note that the fitted lines (and the \(\hat{\beta}_0\) and \(\hat{\beta}_1\) values) vary according to which observations are in the sample.

boot_samples |> 
  filter(n <= 6) |> 
  ggplot(
    mapping = aes(x = Eucs, y = Minerab)
  ) + 
  geom_point() + 
  geom_smooth(
    method = "glm", method.args = list(family = poisson(link = "log")), se = FALSE
  ) + 
  facet_wrap(~n)

This fits the model for each of the 100 samples and extracts the \(\hat{\beta}_1\) values.

beta1hat <- rep(0, N)
for(i in 1:N){
  beta1hat[i] <- glm(Minerab ~ Eucs, family = poisson(link="log"), 
                       data = boot_samples |> filter(n == i)
                     ) |> 
    tidy() |> 
    filter(term == "Eucs") |> 
    pull(estimate)
}

Histogram of \(\hat{\beta}_1\) values. This is a representation of the sampling distribution of \(\hat{\beta}_1\).

tibble(beta1hat = beta1hat) |> 
  ggplot(
    mapping = aes( x= beta1hat)
  ) + 
  geom_histogram(bins=30)

Bootstrap estimate of the standard error.

sd(beta1hat)
## [1] 0.02140763

Compare with the tidy() std.error value of 0.0124. These are not all that close but in the same ballpark.

Fisher information and standard errors

The standard errors given by tidy() come from the log-likelihood function – specifically, the negative second derivative of the log-likelihood function, which is called the Fisher information.

This measures the “sharpness” of the peak of the log-likelihood function (See Fig 4.4 in Section 4.5.2).

A sharper peak means a better estimate with more information and, therefore, less variation. The standard error comes from the inverse of this information (1 divided by sqrt(information)). (See Section 4.5.3)

MLEs follow a normal distribution

Before we discuss the test statistic and the p-value of \(\hat{\beta}_1\), it helps to know the distribution of \(\hat{\beta}_1\).

See Section 4.9.2: Properties of maximum likelihood estimates (like \(\hat{\beta}_1\)). #5 says: “MLEs are asymptotically normally distributed” with

  • mean = true parameter value (\(\beta_1\))

  • standard deviation calculated from Fisher information as described above.

Let’s return to the other quantities given by with tidy() table.

tidy(fit)
## # A tibble: 2 × 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)   -0.876    0.283      -3.10 1.95e- 3
## 2 Eucs           0.114    0.0124      9.17 4.77e-20

statistic (Test statistic)

The test statistic is calculated as the estimate divided by the std.error. Let’s verify this for the test statistic for the coefficient of Eucs.

.114/.0124
## [1] 9.193548
tidy(fit)$estimate[2] / tidy(fit)$std.error[2]
## [1] 9.169092

Hypothesis test for \(\beta_1\)

The test statistic and p-value are calculated within the hypothesis test between two competing hypotheses:

  • Null hypothesis: \(H_0: \beta_1\)= 0 (no relationship between Minerab and Eucs)

  • Alternative hypothesis: \(H_a: \beta_1\neq 0\) (some relationship between Minerab and Eucs)

This test measures how compatible the data and the null hypothesis are. (Test statistic and p-value are based on the null hypothesis that \(H_0: \beta_1= 0\))

Test statistic is a z score

Recall the formula for z scores used in normal calculations:

\[z = \frac{value - mean}{sd}\]

Interpretation of z-score: How standard deviations the value is above or below the mean.

Focusing on the (normal) distribution of \(\hat{\beta}_1\) under \(H_0: \beta_1\)=0, the corresponding z-score formula for \(\hat{\beta}_1\) calculates the test statistic.

  • value = \(\hat{\beta}_1\)

  • mean = \(\beta_1\) (= 0 under \(H_0\))

  • sd estimated by the standard error.

\[ z = \frac{\hat{\beta}_1 - 0}{SE_{\hat{\beta}_1}}\]

Our test statistic is z = 9.17

Interpretation of test statistic: The \(\hat{\beta}_1\) is 9.17 standard deviations above what we expect if \(H_0\) is true.

Assuming \(H_0:\beta_1=0\), should we expect this test statistic (based on what we know about the normal distribution)? No! To be 9 standard deviations away from the mean is a ton for the normal distribution (We said earlier that more than 2 or 3 constitutes an outlier).

The test statistic \(z\) follows a normal distribution with a mean 0 and standard deviation 1. (standard normal distribution).

p.value (p-value)

The p-value converts the test statistic into a probability. Which probability? Assuming \(H_0\) is true, it is the probability of getting a test statistic farther from zero than the one observed.

Picture (for test statistic \(z\), p-value represents shaded area):

The test statistic \(z\) follows a normal distribution with mean = 0 and standard deviation = 1 (standard normal distribution). So we can calculate the p-value from the observed test statistic with the _norm() function.

# calculate p-value
pnorm(-9.17, mean = 0, sd=1) * 2
## [1] 4.73019e-20

Conclusion: The smaller the p-value, the more incompatible the null hypothesis (\(\beta_1=0\)) and the data.

In this case, the p-value is very small, so the data is not compatible \(\beta_1=0\).