For the Leave-One-Out Cross-Validation MSE (\(LOOCV_{(n)}\)) in least squares regression, the following holds:

\[\begin{align*} & LOOCV_{(n)} = \frac{1}{n} \sum_{i = 1}^{n} \left( \frac{y_i - \hat{y_i}}{1 - h_i} \right)^2 \\ where: \quad & H = X(X^TX)^{-1}X^T && \text{(} X \text{ is the design matrix)} \\ \quad & h_i = [H]_{ii} && \text{(diagonal elements of the hat matrix } H \text{, i.e. leverages)} \\ \end{align*}\]

An example dataset data (\(n\) = 20,000 rows), to which I fit \(y = \hat{\beta_0} + \hat{\beta_1}x + \hat{\beta_2}x^2\):

x <- rnorm(20000)
y <- x - 2*x^2 + rnorm(20000)
data <- data.frame(x = x, y = y)

The LOOCV() function:

LOOCV <- function(data, formula) {
  model <- glm(formula(formula), data = data)
  mean(((data$y - predict(model))/(1 - boot::glm.diag(model)$h))^2)
}

LOOCV MSE estimate using my LOOCV() function:

LOOCV(data, "y ~ x + I(x^2)")
## [1] 1.013968
## 2.13 sec elapsed

LOOCV MSE estimate using the boot::cv.glm() function (does not utilize this shortcut formula):

glm_1 <- glm(y ~ x + I(x^2), data = data)
(glm_1_LOOCV <- boot::cv.glm(data, glm_1)$delta[1])
## [1] 1.013968
## 1106.97 sec elapsed