This walk through builds intuition for:
glm(y ~ x, family = binomial)
Using the flow:
Linear predictor → plogis → likelihood (dbinom) → log-likelihood (sum(log)) → maximize (MLE)
With a binary predictor & response.
You surveyed 20 ponds for a frog and you detected frogs in 12 ponds.
You hypothesize that pond size affects frog presence. And so you categorize ponds:
x = 0 \(\rightarrow\) small pond
x = 1 \(\rightarrow\) large pond
Out of all of the ponds you visited, you labeled 10 small and 10 large.
y <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0) #frog presence or absence
x <- c(1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1) #ponds large (1) or small (0)
\(\eta = b_0 + b_1x\)
\(\eta\) = eta
This step:
uses only predictors
produces log odds
is not yet a model
you choose the coefficients
b0 <- -0.5 #here you are choosing these and it feels arbitrary but THIS IS WHAT THE GRIDSEARCH IS DOING. So, you can eventually use a sequence to find the best slope and intercept.
b1 <- 1.0
eta <- b0 + b1*x
plot(eta, main="Log odds, b0: -0.5, b1: 1.0", col="#c77cff", ylab = expression(eta), xlab = "Index")
Now convert those log-odds into probabilities:
allprobs <- plogis(eta)
small ponds get the probability: plogis(b0)
large ponds get the probability:
plogis(b0 + b1)
plot(allprobs, main="Log probabilities converted", ylab="Probabilities", xlab = "Index", col="#00bfc4")
Likelihood is:
a method to measure how well the model fits the data
bigger #’s mean better fit
depends on the following:
** the model
** the parameters
** the data
Given the predicted probabilities in step 2, how likely is each observation?
all_likelihoods <- dbinom(y, size=1, prob = allprobs)
plot(all_likelihoods, main="Likelihoods for each observation of y given x", ylab = "Likelihood", xlab = "Index", col = "#f8766d")
How well do these coefficients explain the frog data?
sum_log <- sum(dbinom(y, 1, allprobs, log=T))
By itself sum_log is just a value you can use to
compare. Use it against different chosen values of b0 and
b1 to find the least negative value!
The least negative gives you the best fit. This is when gridsearch/heatmaps come in handy. They point you in the less arbitrary direction quick.
To estimate coefficients, we would:
try many (b0, b1) combinations
compute the log-likelihood each time
keep the pair with the largest value
This is maximum likelihood estimation.
(Grid search does this manually; glm() software does it automatically.)
# I used CHAT to create this function. I did not make it from scratch.
maximize_loglik_frogs <- function(y, x, b0_seq, b1_seq) {
grid <- expand.grid(b0 = b0_seq, b1 = b1_seq)
grid$loglik <- NA
for (i in 1:nrow(grid)) {
b0 <- grid$b0[i]
b1 <- grid$b1[i]
p <- plogis(b0 + b1 * x)
grid$loglik[i] <- sum(dbinom(y, 1, p, log = TRUE))
}
best <- grid[which.max(grid$loglik), ]
list(best = best, grid = grid)
}
out <- maximize_loglik_frogs(y, x, seq(-6,6,0.1), seq(-6,6,0.1))
out$best
## b0 b1 loglik
## 6845 0.8 -0.4 -12.84116
out$grid$rel_loglik <- out$grid$loglik - max(out$grid$loglik)
ggplot(out$grid, aes(b0, b1, fill = rel_loglik)) +
geom_raster() +
geom_point(data = out$best, aes(b0, b1),
inherit.aes = FALSE, size = 3) +
theme_minimal() +
labs(title = "Relative log-likelihood (0 = best)",
x = "Intercept (b0)", y = "Slope (b1)", fill = "Rel log-lik")+
scale_fill_viridis_c()
Steps 1-5 can be performed using the function glm() in r. So, we basically just did each step to create it! Easy peasy.
glm_frogs <- glm(y~x, family = binomial)
glm_frogs
##
## Call: glm(formula = y ~ x, family = binomial)
##
## Coefficients:
## (Intercept) x
## 0.8473 -0.4418
##
## Degrees of Freedom: 19 Total (i.e. Null); 18 Residual
## Null Deviance: 25.9
## Residual Deviance: 25.68 AIC: 29.68