DATA 605 Assignment 12: Bias-Variance Tradeoff

auto <- read.table('data/auto-mpg.data')
names(auto) <- c('displacement', 'horsepower', 'weight', 'acceleration', 'mpg')

Fits between mpg and the remaining four variables (displacement, horsepower, weight, and acceleration) are created with degrees varying between 1 and 8:

bias_var <- data.frame(N = rep(NA, 8), Error = rep(NA, 8))

set.seed(46) # set seed for replicable results

for (n in 1:8) {
  polyfit <- glm(mpg ~ poly(displacement + horsepower + weight + acceleration, n), data = auto)
  bias_var$N[n] <- n
  bias_var$Error[n] <- cv.glm(auto, polyfit, K = 5)$delta[1]
}

The plot below illustrates how the mean cross-validation error against the degree of the polynomial fit to the data. While there is a slight unexpected bump at N=6, the characteristic U-shaped curve can be seen. The lowest error occurs at N=2.

DATA 605 Assignment 12: Bias-Variance Tradeoff

Dan Smilowitz

November 5, 2016