Generalized Additive Models by Simon N. Wood

The R Book by Michael Crawley

Introduction

Linear Regression, GLM’s and GAMs with R

Linear Regression, GLMs and GAMs with R demonstrates how to use R to extend the basic assumptions and constraints of linear regression to specify, model, and interpret the results of generalized linear (GLMs) and generalized additive (GAMs) models. The course demonstrates the estimation of GLMs and GAMs by working through a series of practical examples from the book Generalized Additive Models: An Introduction with R by Simon N. Wood (Wood 2017).

Linear statistical models have a univariate response modeled as a linear function of predictor variables and a zero mean random error term. The assumption of linearity is a critical (and limiting) characteristic.

Generalized linear models (GLMs) relax this assumption of linearity. They permit the expected value of the response variable to be a smoothed (e.g. non-linear) monotonic function of the linear predictors. GLMs also relax the assumption that the response variable is normally distributed by allowing for many distributions (e.g. normal, poisson, binomial, log-linear, etc.).

Generalized additive models (GAMs) are extensions of GLMs. GAMs allow for the estimation of regression coefficients that take the form of non-parametric smoothers. Nonparametric smoothers like lowess (locally weighted scatterplot smoothing) fit a smooth curve to data using localized subsets of the data.

This course provides an overview of modeling GLMs and GAMs using R. GLMs, and especially GAMs, have evolved into standard statistical methodologies of considerable flexibility. The course addresses recent approaches to modeling, estimating and interpreting GAMs. The focus of the course is on modeling and interpreting GLMs and especially GAMs with R. Use of the freely available R software illustrates the practicalities of linear, generalized linear, and generalized additive models.

What you’ll learn

  • Understand the assumptions of ordinary least squares (OLS) linear regression.
  • Specify, estimate and interpret linear (regression) models using R.
  • Understand how the assumptions of OLS regression are modified (relaxed) in order to specify, estimate and interpret generalized linear models (GLMs).
  • Specify, estimate and interpret GLMs using R.
  • Understand the mechanics and limitations of specifying, estimating and interpreting generalized additive models (GAMs).

For whom is this course:

  • This course would be useful for anyone involved with linear modeling estimation, including graduate students and/or working professionals in quantitative modeling and data analysis.
  • The focus, and majority of content, of this course is on generalized additive modeling. Anyone who wishes to learn how to specify, estimate and interpret GAMs would especially benefit from this course.

How Old is the Universe

  • The big-bang model implies that the universe expands uniformly according to Hubble’s law:1

y   =   β x

  • Where y is the relative velocity of any two galaxies seperated by distance x, and β is “Hubble’s constant.”

  • β is the approximate age of the universe, but β is unknown and must be estimated from observations of x and y.

See; Wood (2017), p. 1-9

##      Galaxy    y     x
## 1   NGC0300  133  2.00
## 2   NGC0925  664  9.16
## 3  NGC1326A 1794 16.14
## 4   NGC1365 1594 17.95
## 5   NGC1425 1473 21.88
## 6   NGC2403  278  3.22
## 7   NGC2541  714 11.22
## 8   NGC2090  882 11.75
## 9   NGC3031   80  3.63
## 10  NGC3198  772 13.80
## 11  NGC3351  642 10.00
## 12  NGC3368  768 10.52
## 13  NGC3621  609  6.64
## 14  NGC4321 1433 15.21
## 15  NGC4414  619 17.70
## 16 NGC4496A 1424 14.86
## 17  NGC4548 1384 16.22
## 18  NGC4535 1444 15.78
## 19  NGC4536 1423 14.93
## 20  NGC4639 1403 21.98
## 21  NGC4725 1103 12.36
## 22   IC4182  318  4.49
## 23  NGC5253  232  3.15
## 24  NGC7331  999 14.72
## 
## Call:
## lm(formula = y ~ x - 1, data = hubble)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -736.5 -132.5  -19.0  172.2  558.0 
## 
## Coefficients:
##   Estimate Std. Error t value Pr(>|t|)    
## x   76.581      3.965   19.32 1.03e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 258.9 on 23 degrees of freedom
## Multiple R-squared:  0.9419, Adjusted R-squared:  0.9394 
## F-statistic: 373.1 on 1 and 23 DF,  p-value: 1.032e-15

In this line I’ll give you the mean-value of y: 924.4.

Plot the residuals against the fitted values

plot(fitted(hub.mod),residuals(hub.mod),xlab="fitted values",ylab="residuals")

Omit offending points and produce new residual plot

hub.mod1 <- lm(y~x-1,data=hubble[-c(3,15),])
summary(hub.mod1)
## 
## Call:
## lm(formula = y ~ x - 1, data = hubble[-c(3, 15), ])
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -304.3 -141.9  -26.5  138.3  269.8 
## 
## Coefficients:
##   Estimate Std. Error t value Pr(>|t|)    
## x    77.67       2.97   26.15   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 180.5 on 21 degrees of freedom
## Multiple R-squared:  0.9702, Adjusted R-squared:  0.9688 
## F-statistic: 683.8 on 1 and 21 DF,  p-value: < 2.2e-16
plot(fitted(hub.mod1),residuals(hub.mod1),xlab="fitted values",ylab="residuals")

Estimate Hubble’s Constant

hubble.const <- c(coef(hub.mod),coef(hub.mod1))/3.09e19
age <- 1/hubble.const
age
##            x            x 
## 4.034934e+17 3.978221e+17
age/(60^2*24*365)
##           x           x 
## 12794692825 12614854757

1.1.3 Adding a distributional assumption

Testing Hypotheses about

cs.hubble <- 163000000
t.stat<-(coef(hub.mod1)-cs.hubble)/summary(hub.mod1)$coefficients[2]
pt(t.stat,df=21)*2
##             x 
## 3.906388e-150

Confidence intervals

sigb <- summary(hub.mod1)$coefficients[2]
h.ci<-coef(hub.mod1)+qt(c(0.025,0.975),df=21)*sigb
h.ci
## [1] 71.49588 83.84995
h.ci<-h.ci*60^2*24*365.25/3.09e19 # convert to 1/years
sort(1/h.ci)
## [1] 11677548698 13695361072

References

Crawley, Michael J. 2013. The R Book. Chichester, UK: John Wiley & Sons.
Liberman, Akiva M. 2005. How much more likely? The implications of odds ratios for probabilities.” American Journal of Evaluation 26 (2): 253–66. https://doi.org/10.1177/1098214005275825.
Wood, Simon N. 2017. Generalized Additive Models. An Introduction with R. London; New York: Taylor & Francis Group.

  1. Easiest way of writing mathematical equation in R Markdown See Youtube: https://www.youtube.com/watch?v=4I3PCDME5U8)↩︎