Generalized, linear, and generalized least squares models (LM, GLM, GLS)

Linear regression (LM)

Linear regression is used to predict the value of an outcome variable Y based on one or more input predictor variables X. The aim is to establish a linear relationship (a mathematical formula) between the predictor variable(s) and the response variable, so that, we can use this formula to estimate the value of the response Y, when only the predictors (Xs) values are known.

The general mathematical equation for a linear regression is

y = ax + b

  • y is the response variable
  • x is the predictor variable
  • a and b are constants which are called the coefficients
plot(Ozone~Wind, airquality)

model1 <- lm(Ozone~Wind, airquality)
par(mfrow = c(2, 2))
plot(model1)

coef(model1)
## (Intercept)        Wind 
##   96.872895   -5.550923
Oz1 <- coef(model1)[1] + coef(model1)[2] * 19
Oz2 <- coef(model1)[1] + coef(model1)[2] * 20

Oz1
## (Intercept) 
##    -8.59464
Oz2
## (Intercept) 
##   -14.14556

Generalized Linear Models (GLM)

Generalized linear models provide a common approach to a broad range of response modeling problems. Normal, Poisson, and binomial responses are the most commonly used, but other distributions can be used as well. Apart from specifying the response, GLMs also need a link function to be set which allows further flexibility in the modeling. The GLM can be fitted using a common procedure and a mechanism for hypothesis testing is available. Diagnostics using deviance residuals provide a way to check that chosen models are adequate.




glm



model2 <- glm(Ozone~Wind, airquality, family = poisson)
coef(model2)
## (Intercept)        Wind 
##   5.0795877  -0.1488753
Oz1.glm <- exp(coef(model2)[1] + coef(model2)[2] * 19)
Oz2.glm <- exp(coef(model2)[1] + coef(model2)[2] * 20)

Oz2.glm / Oz1.glm
## (Intercept) 
##   0.8616765
exp(coef(model2)[2])
##      Wind 
## 0.8616765

Generalized Least Squares (GLS)

The generalized least squares (GLS) estimator of the coefficients of a linear regression is a generalization of the ordinary least squares (OLS) estimator. It is used to deal with situations in which the OLS estimator is not BLUE (best linear unbiased estimator) because one of the main assumptions of the Gauss-Markov theorem, namely that of homoskedasticity and absence of serial correlation, is violated. In such situations, provided that the other assumptions of the Gauss-Markov theorem are satisfied, the GLS estimator is BLUE.

library(nlme)
#model3 <- gls(Ozone~Wind, airquality)

# Check for na's
summary(airquality$Ozone)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    1.00   18.00   31.50   42.13   63.25  168.00      37
# suppress na's
model3 <- gls(Ozone~Wind, airquality, na.action = na.exclude)

head(airquality)
##   Ozone Solar.R Wind Temp Month Day
## 1    41     190  7.4   67     5   1
## 2    36     118  8.0   72     5   2
## 3    12     149 12.6   74     5   3
## 4    18     313 11.5   62     5   4
## 5    NA      NA 14.3   56     5   5
## 6    28      NA 14.9   66     5   6
airquality$Date <- as.Date(paste(1973, airquality$Month, airquality$Day, sep = "-"))

library(lattice)

xyplot(Ozone~Date, airquality)

#model4 <- gls(Ozone~Wind*Date, airquality, na.action = na.exclude)
#plot(ACF(model4))

air2=subset(airquality, complete.cases(Ozone))

model5 <- gls(Ozone~Wind*Date, air2)
plot(ACF(model5, form=~Date), alpha=0.05) # alpha plots significant line

# Update correlation
model6 <- update(model5, correlation=corAR1())

library(MuMIn)

# Compare model
AICc(model5, model6)
##        df    AICc
## model5  5 1099.40
## model6  6 1095.21
summary(model6)
## Generalized least squares fit by REML
##   Model: Ozone ~ Wind * Date 
##   Data: air2 
##        AIC     BIC    logLik
##   1094.439 1110.75 -541.2197
## 
## Correlation Structure: AR(1)
##  Formula: ~1 
##  Parameter estimate(s):
##       Phi 
## 0.2811007 
## 
## Coefficients:
##                  Value Std.Error   t-value p-value
## (Intercept) -310.82785 224.46119 -1.384773  0.1689
## Wind          26.34403  18.92699  1.391876  0.1667
## Date           0.30485   0.17265  1.765721  0.0802
## Wind:Date     -0.02370   0.01462 -1.621035  0.1078
## 
##  Correlation: 
##           (Intr) Wind   Date  
## Wind      -0.909              
## Date      -0.999  0.910       
## Wind:Date  0.907 -0.999 -0.909
## 
## Standardized residuals:
##        Min         Q1        Med         Q3        Max 
## -1.5165120 -0.7094215 -0.1900599  0.6459231  3.3907429 
## 
## Residual standard error: 26.69287 
## Degrees of freedom: 116 total; 112 residual