(a). Explain how k-fold cross-validation is implemented.
k-fold cross-validation works by randomly dividing the set of observations into k roughly equal-sized parts or “folds.” The process follows these steps:
Partition the Data: The original data set is randomly split into k equal-sized subsets (folds). Model Fitting and Validation: For each of the k folds: Treat the k - 1 folds as the training set. Treat the remaining 1 fold as the validation (test) set. Fit the model on the training set and evaluate it on the validation set. Record the prediction error. Averaging: After repeating this for each of the k folds (so every observation has been in the validation set exactly once), average the k recorded errors. This average is the overall cross-validation error estimate. This procedure gives a more stable and less variable estimate of test error than methods that use a single validation set.
(b). What are the advantages and disadvantages of k-fold cross validation relative to:
Advantages: Less Variance in the Estimate: Because k-fold CV averages over k different training/validation splits, it provides a more accurate and less variable estimate of the test error compared to the single split used in the validation set approach. Better Use of Data: Each observation is used for both training and validation (just not in the same fold), making better use of the available data.
Disadvantages: More Computationally Intensive: k-fold CV involves fitting the model k times, which is more computationally expensive than the single model fit of the validation set approach.
Advantages: Less Computational Cost: LOOCV requires fitting the model n times (where n is the number of observations), which can be expensive for large datasets. k-fold CV typically uses k = 5 or k = 10, requiring far fewer model fits. Lower Variance in Estimates: Although LOOCV has low bias, it can have high variance because the training sets are so similar to one another. k-fold CV provides a good balance between bias and variance.
Disadvantages: Slightly More Bias: LOOCV makes use of nearly the full dataset for training on each iteration, which can result in lower bias compared to k-fold CV (especially for small k like k = 5).
To estimate the standard deviation of our prediction for Y at a specific value of the predictor X, we can use the bootstrap resampling method, which allows us to assess the variability in our prediction due to sampling randomness.
This involves the following steps:
\[ \widehat{\text{SD}}[\hat{f}(X)] = \sqrt{ \frac{1}{B - 1} \sum_{b=1}^{B} \left( \hat{f}_b(X) - \bar{\hat{f}}(X) \right)^2 } \]
Here, \(\bar{\hat{f}}(X)\) is the average of the bootstrap predictions:
\[ \bar{\hat{f}}(X) = \frac{1}{B} \sum_{b=1}^{B} \hat{f}_b(X) \]
This quantity estimates the standard deviation of our prediction, capturing how much it varies if we were to collect new training data.