k-fold CV:

Data set is divided into equal size k subsets called folds. Each fold is considered as a validation set and the rest k-1 folds are a training set. Fit the model and predict using 1 k-fold set. Repeat this for k times for each fold as a validation set.

Test-error rate is average of all k errors. Some tools give the minimum.

Advantages: takes care of both drawbacks of validation-set methods as well as LOOCV.

(1) No randomness of using some observations for training vs. validation set like in validation-set method as each observation is considered for both training and validation. So overall less variability than Validation-set method due to no randomness.(it does not mean LOOCV does not have variability, see disadvantage no.1)

(2) As validation set is larger than in LOOCV, it gives less variability in test-error as more observations are used for each iteration’s prediction.

(3) Less bias than validation-set method as training-set is larger is use 5 or more folds. Because of larger training set reduced bias, reduced over-estimation of test-error, not as much compared to validation-set method.

(4) computationally NOT expensive as LOOCV as need to run it only k times which is usually 5 or 10.

Disadvantages:

(1) Some what higher bias than LOOCV due to smaller training set for each iteration (but smaller bias than Validation-set method due to larger training set than VS.)

suppressMessages(library(ISLR))

data("Auto")
dim(Auto)
## [1] 392   9
library(boot)
## 
## Attaching package: 'boot'
## The following object is masked from 'package:survival':
## 
##     aml
## The following object is masked from 'package:lattice':
## 
##     melanoma
set.seed(17)

glm.fit = glm(mpg ~ horsepower, data=Auto)

cv.error = cv.glm(Auto, glm.fit, K=10)

cv.error$delta
## [1] 24.20520 24.19133

unlike LOOCV, both cv-error numbers are slightly different. this is because first numer is standard k-fold CV estimate of test-MSE. the second number is bias-corrected version.

set.seed(17)

cv.error = rep(0,5)

for (i in 1:5) {
        glm.fit = glm(mpg ~ poly(horsepower,i), data=Auto)
        cv.error[i] = round(cv.glm(Auto, glm.fit, K=10)$delta[1],2)
}

cv.error
## [1] 24.21 19.19 19.31 19.34 18.88

these are 5 test-MSEs for 5 different models using 10-fold CV.

Computation time for k-fold CV should be shorter than LOOCV especially for large datasets. not for least-squares regression due to the availability of the formula.

if you compare test-MSEs are better in case of k-fold CV than LOOCV. k-fold CV or any CV or resampling methods does not improve test errors. they estimate test errors. in case of k-fold, it does better job of estimating error than LOOCV.