For my fifth blog I will be talking about generalized linear models. Generalized linear models(GLMs) consist of two factors, an exponential family of distributions and a link function.
To demonstrate GLMs I will be using the Boston dataset.
library(MASS)
head(Boston)
## crim zn indus chas nox rm age dis rad tax ptratio black lstat
## 1 0.00632 18 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98
## 2 0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14
## 3 0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03
## 4 0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94
## 5 0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33
## 6 0.02985 0 2.18 0 0.458 6.430 58.7 6.0622 3 222 18.7 394.12 5.21
## medv
## 1 24.0
## 2 21.6
## 3 34.7
## 4 33.4
## 5 36.2
## 6 28.7
To fit the model the glm function is used.
model <- glm(age ~ medv + ptratio + rm, family = gaussian, data = Boston)
model
##
## Call: glm(formula = age ~ medv + ptratio + rm, family = gaussian, data = Boston)
##
## Coefficients:
## (Intercept) medv ptratio rm
## 59.845 -1.098 1.230 1.713
##
## Degrees of Freedom: 505 Total (i.e. Null); 502 Residual
## Null Deviance: 400100
## Residual Deviance: 340300 AIC: 4740
Here the glm function is made up of formula, family, link, and data. The formula is created based on the variables you use. The family is the type of distribution used, in this case gaussian is used. The family type has a default link function, gaussian’s link function is “identity”. The data used is the boston dataset.
plot(model)
When you plot the model you get different graphs based on the residuals and a Q-Q plot.