Notes 14
Poisson Counts | Part 4: Standard Errors, Hypothesis Tests for Beta Coefficients
Setup
Loading packages
## Warning: package 'ggplot2' was built under R version 4.3.3
## Warning: package 'broom' was built under R version 4.3.3
## Warning: package 'statmod' was built under R version 4.3.3
Noisy miner data context
## Miners Eucs Area Grazed Shrubs Bulokes Timber Minerab
## 1 0 2 22 0 1 120 16 0
## 2 0 10 11 0 1 67 25 0
## 3 1 16 51 0 1 85 13 3
## 4 1 20 22 0 1 45 12 2
## 5 1 19 4 0 1 160 14 8
## 6 1 18 61 0 1 75 6 1
Revisit Poisson model for y = Minerab
in terms of x =
Eucs
with log link:
\(Minerab \sim Poisson(\mu)\), where
\(\log(\mu) = \beta_0 + \beta_1 Eucs\)
Fit the model
Obtain maximum likelihoood estimates (MLEs) for beta coefficients
## # A tibble: 2 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -0.876 0.283 -3.10 1.95e- 3
## 2 Eucs 0.114 0.0124 9.17 4.77e-20
So far, we’ve only been focusing on the estimate
column
of this table. In this notes, we focus on the other columns:
std.error
(standard error)statistic
(test statistic)p.value
We’ll focus on the row for Eucs
, which gives the
estimate, standard error, test statistic, and p-value for \(\beta_1\)
std.error
(Standard error)
The standard error of an estimate, say \(\hat{\beta}_1\), is an estimate of the standard deviation of the sampling distribution of \(\hat{\beta}_1\).
Sampling distribution of \(\hat{\beta}_1\):
we consider taking repeated samples from the population and fitting the model for each sample.
The distribution of the \(\hat{\beta}_1\) values we get for different samples is the sampling distribution of \(\hat{\beta}_1\).
Bootstrap method
“Taking repeated samples from the population” is a theoretical concept and hard to grasp since we don’t know the population. But a good substitute for this in practice is taking repeated samples with replacement from the sample. This is called the bootstrap method.
Taking samples with and without replacement. Consider taking a sample of 10 balls from a bingo ball cage out of 100 ping pong balls that are numbered from 1 to 100. To take the sample, select one ball at a time.
With replacement: You put the ball back in the cage before you select the next ball. (In other words, the ball can be selected multiple times.)
Without replacement: Once you select a ball, don’t put it back in the cage. (The ball can only be selected at most once.)
Let’s take 100 bootstrap samples from the nminer
dataset. (Usually you’d take thousands of samples but let’s keep the
computation time low.)
The workhorse here is the sample_n()
function: we are
taking samples from nminer
, the same size of the original
dataset, with replacement.
set.seed(24601)
N <- 100
boot_samples <- tibble()
for(i in 1:N){
boot_samples <- bind_rows(
boot_samples,
tibble(
n = i,
sample_n(nminer, size = nrow(nminer), replace = TRUE)
)
) |>
select(n, Minerab, Eucs)
}
Plots of the first 6 bootstrap samples, along with the fitted mean lines. Note that the fitted lines (and the \(\hat{\beta}_0\) and \(\hat{\beta}_1\) values) vary according to which observations are in the sample.
boot_samples |>
filter(n <= 6) |>
ggplot(
mapping = aes(x = Eucs, y = Minerab)
) +
geom_point() +
geom_smooth(
method = "glm", method.args = list(family = poisson(link = "log")), se = FALSE
) +
facet_wrap(~n)
This fits the model for each of the 100 samples and extracts the \(\hat{\beta}_1\) values.
beta1hat <- rep(0, N)
for(i in 1:N){
beta1hat[i] <- glm(Minerab ~ Eucs, family = poisson(link="log"),
data = boot_samples |> filter(n == i)
) |>
tidy() |>
filter(term == "Eucs") |>
pull(estimate)
}
Histogram of \(\hat{\beta}_1\) values. This is a representation of the sampling distribution of \(\hat{\beta}_1\).
Bootstrap estimate of the standard error.
## [1] 0.02140763
Compare with the tidy()
std.error
value of
0.0124. These are not all that close but in the same ballpark.
Fisher information and standard errors
The standard errors given by tidy()
come from the
log-likelihood function – specifically, the negative second derivative
of the log-likelihood function, which is called the Fisher
information.
This measures the “sharpness” of the peak of the log-likelihood function (See Fig 4.4 in Section 4.5.2).
A sharper peak means a better estimate with more information and, therefore, less variation. The standard error comes from the inverse of this information (1 divided by sqrt(information)). (See Section 4.5.3)
MLEs follow a normal distribution
Before we discuss the test statistic and the p-value of \(\hat{\beta}_1\), it helps to know the distribution of \(\hat{\beta}_1\).
See Section 4.9.2: Properties of maximum likelihood estimates (like \(\hat{\beta}_1\)). #5 says: “MLEs are asymptotically normally distributed” with
mean = true parameter value (\(\beta_1\))
standard deviation calculated from Fisher information as described above.
Let’s return to the other quantities given by with
tidy()
table.
## # A tibble: 2 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -0.876 0.283 -3.10 1.95e- 3
## 2 Eucs 0.114 0.0124 9.17 4.77e-20
statistic
(Test statistic)
The test statistic is calculated as the estimate
divided
by the std.error
. Let’s verify this for the test statistic
for the coefficient of Eucs
.
## [1] 9.193548
## [1] 9.169092
Hypothesis test for \(\beta_1\)
The test statistic and p-value are calculated within the hypothesis test between two competing hypotheses:
Null hypothesis: \(H_0: \beta_1\)= 0 (no relationship between
Minerab
andEucs
)Alternative hypothesis: \(H_a: \beta_1\neq 0\) (some relationship between
Minerab
andEucs
)
This test measures how compatible the data and the null hypothesis are. (Test statistic and p-value are based on the null hypothesis that \(H_0: \beta_1= 0\))
Test statistic is a z score
Recall the formula for z scores used in normal calculations:
\[z = \frac{value - mean}{sd}\]
Interpretation of z-score: How standard deviations the value is above or below the mean.
Focusing on the (normal) distribution of \(\hat{\beta}_1\) under \(H_0: \beta_1\)=0, the corresponding z-score formula for \(\hat{\beta}_1\) calculates the test statistic.
value = \(\hat{\beta}_1\)
mean = \(\beta_1\) (= 0 under \(H_0\))
sd estimated by the standard error.
\[ z = \frac{\hat{\beta}_1 - 0}{SE_{\hat{\beta}_1}}\]
Our test statistic is z = 9.17
Interpretation of test statistic: The \(\hat{\beta}_1\) is 9.17 standard deviations above what we expect if \(H_0\) is true.
Assuming \(H_0:\beta_1=0\), should we expect this test statistic (based on what we know about the normal distribution)? No! To be 9 standard deviations away from the mean is a ton for the normal distribution (We said earlier that more than 2 or 3 constitutes an outlier).
The test statistic \(z\) follows a normal distribution with a mean 0 and standard deviation 1. (standard normal distribution).
p.value
(p-value)
The p-value converts the test statistic into a probability. Which probability? Assuming \(H_0\) is true, it is the probability of getting a test statistic farther from zero than the one observed.
Picture (for test statistic \(z\), p-value represents shaded area):
The test statistic \(z\) follows a
normal distribution with mean = 0 and standard deviation = 1 (standard
normal distribution). So we can calculate the p-value from the observed
test statistic with the _norm()
function.
## [1] 4.73019e-20
Conclusion: The smaller the p-value, the more incompatible the null hypothesis (\(\beta_1=0\)) and the data.
In this case, the p-value is very small, so the data is not compatible \(\beta_1=0\).