|IS 605 FUNDAMENTALS OF COMPUTATIONAL MATHEMATICS - WEEK 12

Instruction 1:

Using the stats and boot libraries in R perform a cross-validation experiment to observe the bias variance tradeo . You’ll use the auto data set from previous assignments. This dataset has 392 observations across 5 variables. We want to t a polynomial model of various degrees using the glm function in R and then measure the cross validation error using cv.glm function. Fit various polynomial models to compute mpg as a function of the other four variables acceleration, weight, horsepower, and displacement using glm function.

options(warn = -1)
library(knitr)
library(stats)
library(boot)



autompg <- read.table("https://raw.githubusercontent.com/mascotinme/GitHub/master/MSDA%20605/auto-mpg.txt", col.names = c("displacement", "hp", "weight", "acceleration", "mpg"))



kable(head(autompg)) # A glimpse of the dataset

displacement	hp	weight	acceleration	mpg
307	130	3504	12.0	18
350	165	3693	11.5	15
318	150	3436	11.0	18
304	150	3433	12.0	16
302	140	3449	10.5	17
429	198	4341	10.0	15

str(autompg) # The data structure

## 'data.frame':    392 obs. of  5 variables:
##  $ displacement: num  307 350 318 304 302 429 454 440 455 390 ...
##  $ hp          : num  130 165 150 150 140 198 220 215 225 190 ...
##  $ weight      : num  3504 3693 3436 3433 3449 ...
##  $ acceleration: num  12 11.5 11 12 10.5 10 9 8.5 10 8.5 ...
##  $ mpg         : num  18 15 18 16 17 15 14 14 14 15 ...

kable(summary(autompg))# inferential statistic of the variables.

displacement	hp	weight	acceleration	mpg
Min. : 68.0	Min. : 46.0	Min. :1613	Min. : 8.00	Min. : 9.00
1st Qu.:105.0	1st Qu.: 75.0	1st Qu.:2225	1st Qu.:13.78	1st Qu.:17.00
Median :151.0	Median : 93.5	Median :2804	Median :15.50	Median :22.75
Mean :194.4	Mean :104.5	Mean :2978	Mean :15.54	Mean :23.45
3rd Qu.:275.8	3rd Qu.:126.0	3rd Qu.:3615	3rd Qu.:17.02	3rd Qu.:29.00
Max. :455.0	Max. :230.0	Max. :5140	Max. :24.80	Max. :46.60

Instruction 2:

$\quad glm.fit=glm(mpg~poly(disp+hp+wt+acc,2), data=auto)\quad$

$\quad cv.err5[2]=cv.glm(auto,glm.fit,K=5)$delta[1] $

will fit a 2nd degree polynomial function between mpg and the remaining 4 variables and perform 5 iterations of cross-validations. This result will be stored in a cv.err5 array. cv.glm returns the estimated cross validation error and its adjusted value in a variable called delta. Please see the help on cv.glm to see more information.

Once you have t the various polynomials from degree 1 to 8, you can plot the cross- validation error function as

$\quad degree=1:8 \quad$

$\quad plot(degree,cv.err5,type='b') \quad$

training <- autompg
crossvalidation <- autompg
set.seed(8)

cv.err <- c()

for(n in 1:8)
{
  fit <- glm(mpg ~ poly(displacement + hp + weight + acceleration, n), data=training)
  fit
  
  cv.err[n] <- cv.glm(crossvalidation, fit, K=5)$delta[1]
}


cv.err

## [1] 18.40003 16.95578 17.44431 17.05143 17.17140 17.44844 17.00235 16.97621

perform the polynomial fitt and then plot the resulting 5 fold cross validation curve. Your output should show the characteristic U-shape illustrating the tradeo between bias and variance.

degree <- 1:8
plot(degree, y = cv.err, type='b',xlab = "Polynomial Degree", ylab = "Cross Validation Error", main = "Plot of Tradeoff Between Bias and Variance")

|IS 605 FUNDAMENTALS OF COMPUTATIONAL MATHEMATICS - WEEK 12 | Data Analytics

Author:MUSA T GANIYU