GEOG 6000 Spatial Statistics and Geographical Data Analysis

Bootstrap example

This files gives an example of running a bootstrapped regression, and returns a histogram of possible slope values. It uses the dataset cars included as default in R.

Load the data and get the number of observations:

data(cars)
n1 = dim(cars)[1]

Set the number of iterations for bootstrapping, and make a blank vector to store the results:

nbit = 1000
myslope = rep(NA, nbit)

Blank matrix for prediction errors

myprederr = matrix(NA, nrow = nbit, ncol = n1)

Now run the bootstrap as a loop. A couple of things to notice here:

We use the sample() function, with replace set to true to resample the original dataframe as a new set of data (cars2)
We then build a linear model using this new data set
Store the slope value and the prediction errors (obs-est)

for (i in 1:nbit) {
    cars2 = cars[sample(n1, replace = TRUE), ]
    fit = lm(dist ~ speed, cars2)
    myslope[i] = coef(fit)[2]
    myprederr[i, ] = (fitted(fit) - cars$dist)^2
}

Once run, we can plot out the results as a histogram

hist(myslope)

plot of chunk unnamed-chunk-5

Or as a nicer visualization, we can make a density function, and add a smoothed kernel density on top:

hist(myslope, freq = FALSE, breaks = 20, col = "darkgreen")
lines(density(myslope), col = 2, lwd = 4)

plot of chunk unnamed-chunk-6

Finally, calculate and print the RMSE

mse = apply(myprederr, 2, mean)
rmse = sqrt(mse)
print(rmse)

##  [1] 46.25 38.59 44.11 28.94 33.52 38.32 31.59 27.44 22.91 33.43 25.16
## [12] 35.11 30.61 28.39 25.90 27.49 22.24 22.17 21.23 26.29 21.28 26.81
## [23] 41.34 30.14 26.51 23.56 23.34 21.06 23.50 20.92 21.37 21.00 25.38
## [34] 38.54 45.96 22.26 21.16 32.81 23.48 21.91 22.38 24.75 29.70 31.64
## [45] 23.17 35.86 53.18 54.56 80.22 47.56

The median RMSE value of the 1000 runs is 27.1237