March 19, 2019
Generalised linear models (GLMs) have three basic components:
A GLM transforms the response variable from its original scale (e.g. bounded between 0 and 1 with binomial data) to an unbounded transformed scale and uses the latter to test the effect of the predictor function. The transformed and original scales are coupled by a link function that allows back-transformation to the original scale while preserving the distributional requirements. For example, with a binomial response variable (e.g. dead/alive), the purpose of the link function is to ensure that the model predictions and their confidence intervals lie between these 0 and 1 when we back-transform them onto the original scale of the response variable.
glm
function is very similar to the familiar lm
function but has a family
argument to specify the distribution of the response variable and the associated model errors. Along with the distribution, we specify a link function, which ensures that distributional requirements are satisfied during back-tranformation of the model predictions onto the original scale.## Binomial GLM (logistic regression) for binary data glm(y ~ x, data = ..., family = binomial(link = "logit")) ## Poisson GLM for count data glm(y ~ x, data = ..., family = poisson(link = "log"))
One needs to distinguish between apparent and real overdisperion. Apparent overdisperion can result from:
A ratio of the residual deviance over the residual degrees of freedom that is much higher than 1 indicates overdispersion.
## Fit a quasi-likelihood GLM glm(y ~ x, data = ..., family = quasipoisson(link = "log")) library(MASS) ## Fit a negative binomial GLM glm.nb(y ~ x, data = ...)
Modelling binomial (binary) data is commonly referred to as logistic regression. We are dealing with binary or binomial data when the response variable may only take on two levels such as dead/alive, absent/present, male/female, or parasitised/unparasitised for example. The binary nature of the data matches the binomial distribution which describes the discrete probability distribution of the number of successes in a sequence of success/failure experiments.
The glm
function requires the binomial response variable in one of three ways: