Using the stats and boot libraries in R perform a cross-validation experiment to observethe bias variance tradeoff. You’ll use the auto data set from previous assignments. This dataset has 392 observations across 5 variables. We want to fit a polynomial model of various degrees using the glm function in R and then measure the cross validation error using cv.glm function.
auto.data <- read.table('auto-mpg.data', col.names = c('DP', 'HP', 'WT', 'ACC', 'MPG'))
View(auto.data)
Now that we have our data loaded, we will start working on fitting various polynomial models to compute mpg as a function of the other four variables.
require(stats)
require(boot)
## Loading required package: boot
set.seed(100)
degree <- 1:8
cv.err5 <- vector()
#the following loop is for fitting the models
for (i in degree) {
glm.fit = glm(MPG ~ poly(DP + HP + WT + ACC, i), data = auto.data)
cv.err5[i] = cv.glm(auto.data, glm.fit, K = 5)$delta[1]
}
#I initally tried setting up 8 different variables and running them one by one but realized that it could be done using a loop
#now to plot
plot(degree, cv.err5, type = 'b')
Your output should show the characteristic U-shape illustrating the tradeoff. between bias and variance.
I’m not sure if I missed something or if this assignment is really as straight forward as it seemed, hopefully the latter.