Splines are more flexible than polynomials, but the idea is rather similar. Here we will explore cubic splines.
require(splines)
## Loading required package: splines
require(ISLR)
## Loading required package: ISLR
## Warning: package 'ISLR' was built under R version 3.5.1
attach(Wage)
fit = lm(wage~bs(age,knots = c(25,40,60)), data = Wage)
agelims = range(age)
age.grid = seq(from=agelims[1],to=agelims[2])
plot(age, wage, col = "darkgrey")
lines(age.grid, predict(fit,list(age=age.grid)),col="darkgreen",lwd=2)
abline(v=c(25,40,60),lty = 2, col= "darkgreen")
Smoothing splines doesnot requires knot selection, but it does have a smoothing parameter, which can conveniently be specified via the effective degrees of freedom or ‘df’
fit1=smooth.spline(age,wage,df=16)
agelims = range(age)
age.grid = seq(from=as.numeric(agelims[1]),to=as.numeric(agelims[2]))
plot(age, wage, col = "darkgrey")
lines(age.grid, predict(fit,list(age=age.grid)),col="darkgreen",lwd=2)
abline(v=c(25,40,60),lty = 2, col= "darkgreen")
lines(fit1, col = "red", lwd = 2)
Or we can use L00 cross-validation to select the smoothing parameter for us automatically.
fit2 = smooth.spline(age, wage, cv = TRUE)
## Warning in smooth.spline(age, wage, cv = TRUE): cross-validation with non-
## unique 'x' values seems doubtful
plot(age, wage, col = "darkgrey")
lines(age.grid, predict(fit,list(age=age.grid)),col="darkgreen",lwd=2)
abline(v=c(25,40,60),lty = 2, col= "darkgreen")
lines(fit2, col = "blue", lwd = 3)
fit2
## Call:
## smooth.spline(x = age, y = wage, cv = TRUE)
##
## Smoothing Parameter spar= 0.6988943 lambda= 0.02792303 (12 iterations)
## Equivalent Degrees of Freedom (Df): 6.794596
## Penalized Criterion (RSS): 75215.9
## PRESS(l.o.o. CV): 1593.383
So far we have focussed on fitting models with mostly single nonlinear terms. The gam packages it easier to work with multiple linear terms. In addition it knows how to plot these functions and their standard errors.
#install.packages("gam")
library(gam)
## Warning: package 'gam' was built under R version 3.5.1
## Loading required package: foreach
## Loaded gam 1.16
# s is special function for smoothing splines
gam1 = gam(wage~s(age,df = 4) + s(year, df = 4) + education, data = Wage)
par(mfrow=c(1,3))
plot(gam1, se = TRUE)
gam2 = gam(I(wage>250)~s(age,df = 4)+s(year, df = 4)+education, data = Wage, family = "binomial")
plot(gam2)
gam2a = gam(I(wage>250)~s(age,df=4)+year+education, data = Wage, family = binomial)
anova(gam2,gam2a, test = "Chisq")
## Analysis of Deviance Table
##
## Model 1: I(wage > 250) ~ s(age, df = 4) + s(year, df = 4) + education
## Model 2: I(wage > 250) ~ s(age, df = 4) + year + education
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 2987 602.87
## 2 2990 603.78 -3 -0.90498 0.8242
Also gam package work well to plot for lm & glm
par(mfrow=c(1,3))
#ns - natural splines
lm1 = lm(wage~ns(age, df=4)+ns(year,df=4)+education, data = Wage)
plot.Gam(lm1, se = T)