This files gives an example of running a bootstrapped regression, and returns a histogram of possible slope values. It uses the dataset cars included as default in R.
Load the data and get the number of observations:
data(cars)
n1 = dim(cars)[1]
Set the number of iterations for bootstrapping, and make a blank vector to store the results:
nbit = 1000
myslope = rep(NA, nbit)
Blank matrix for prediction errors
myprederr = matrix(NA, nrow = nbit, ncol = n1)
Now run the bootstrap as a loop. A couple of things to notice here:
for (i in 1:nbit) {
cars2 = cars[sample(n1, replace = TRUE), ]
fit = lm(dist ~ speed, cars2)
myslope[i] = coef(fit)[2]
myprederr[i, ] = (fitted(fit) - cars$dist)^2
}
Once run, we can plot out the results as a histogram
hist(myslope)
Or as a nicer visualization, we can make a density function, and add a smoothed kernel density on top:
hist(myslope, freq = FALSE, breaks = 20, col = "orange")
lines(density(myslope), col = 2, lwd = 4)
Finally, calculate and print the RMSE
mse = apply(myprederr, 2, mean)
rmse = sqrt(mse)
print(rmse)
## [1] 46.56 38.34 44.04 28.60 33.18 39.39 32.90 27.51 22.60 33.70 25.01
## [12] 34.85 30.74 28.27 26.03 27.13 22.51 22.46 21.30 26.65 21.26 27.32
## [23] 41.92 30.22 26.46 22.81 24.11 21.38 23.34 20.78 22.76 20.47 24.05
## [34] 38.72 46.37 21.72 21.11 31.78 23.88 21.84 21.76 24.35 29.18 31.25
## [45] 23.43 34.15 53.42 53.92 79.75 46.63
The median RMSE value of the 1000 runs is 27.2261