GEOG 6000 Spatial Statistics and Geographical Data Analysis

Bootstrap example

This files gives an example of running a bootstrapped regression, and returns a histogram of possible slope values. It uses the dataset cars included as default in R.

Load the data and get the number of observations:

data(cars)
n1 = dim(cars)[1]

Set the number of iterations for bootstrapping, and make a blank vector to store the results:

nbit = 1000
myslope = rep(NA, nbit)

Blank matrix for prediction errors

myprederr = matrix(NA, nrow = nbit, ncol = n1)

Now run the bootstrap as a loop. A couple of things to notice here:

for (i in 1:nbit) {
    cars2 = cars[sample(n1, replace = TRUE), ]
    fit = lm(dist ~ speed, cars2)
    myslope[i] = coef(fit)[2]
    myprederr[i, ] = (fitted(fit) - cars$dist)^2
}

Once run, we can plot out the results as a histogram

hist(myslope)

plot of chunk unnamed-chunk-5

Or as a nicer visualization, we can make a density function, and add a smoothed kernel density on top:

hist(myslope, freq = FALSE, breaks = 20, col = "orange")
lines(density(myslope), col = 2, lwd = 4)

plot of chunk unnamed-chunk-6

Finally, calculate and print the RMSE

mse = apply(myprederr, 2, mean)
rmse = sqrt(mse)
print(rmse)
##  [1] 46.56 38.34 44.04 28.60 33.18 39.39 32.90 27.51 22.60 33.70 25.01
## [12] 34.85 30.74 28.27 26.03 27.13 22.51 22.46 21.30 26.65 21.26 27.32
## [23] 41.92 30.22 26.46 22.81 24.11 21.38 23.34 20.78 22.76 20.47 24.05
## [34] 38.72 46.37 21.72 21.11 31.78 23.88 21.84 21.76 24.35 29.18 31.25
## [45] 23.43 34.15 53.42 53.92 79.75 46.63

The median RMSE value of the 1000 runs is 27.2261