This files gives an example of running a bootstrapped regression, and returns a histogram of possible slope values. It uses the dataset cars included as default in R.
Load the data and get the number of observations:
data(cars)
n1 = dim(cars)[1]
Set the number of iterations for bootstrapping, and make a blank vector to store the results:
nbit = 1000
myslope = rep(NA, nbit)
Blank matrix for prediction errors
myprederr = matrix(NA, nrow = nbit, ncol = n1)
Now run the bootstrap as a loop. A couple of things to notice here:
for (i in 1:nbit) {
cars2 = cars[sample(n1, replace = TRUE), ]
fit = lm(dist ~ speed, cars2)
myslope[i] = coef(fit)[2]
myprederr[i, ] = (fitted(fit) - cars$dist)^2
}
Once run, we can plot out the results as a histogram
hist(myslope)
Or as a nicer visualization, we can make a density function, and add a smoothed kernel density on top:
hist(myslope, freq = FALSE, breaks = 20, col = "darkgreen")
lines(density(myslope), col = 2, lwd = 4)
Finally, calculate and print the RMSE
mse = apply(myprederr, 2, mean)
rmse = sqrt(mse)
print(rmse)
## [1] 46.25 38.59 44.11 28.94 33.52 38.32 31.59 27.44 22.91 33.43 25.16
## [12] 35.11 30.61 28.39 25.90 27.49 22.24 22.17 21.23 26.29 21.28 26.81
## [23] 41.34 30.14 26.51 23.56 23.34 21.06 23.50 20.92 21.37 21.00 25.38
## [34] 38.54 45.96 22.26 21.16 32.81 23.48 21.91 22.38 24.75 29.70 31.64
## [45] 23.17 35.86 53.18 54.56 80.22 47.56
The median RMSE value of the 1000 runs is 27.1237