Hoeffding's inequality

Simulation of Hoeffding's inequality.

library(ggplot2)

bin.size <- 1000

bin <- sapply(seq(bin.size), function(x) {
    sample(c(1, 0), size = 1)
})

bin.avg <- sum(bin)/bin.size

Now that we have a bin with “marbles” we can experiment with it. Take 100 times 10 marbles from the bin, and compute the average. Check that the average of the averages is close to the bin.avg.

iterations <- 100

sample.size <- 100

sample.value <- function() {
    s <- bin[sample(seq(bin.size), sample.size)]
    return(sum(s)/sample.size)
}

sample.values <- sapply(seq(iterations), function(x) {
    sample.value()
})

abs(sum(sample.values)/iterations - bin.avg)
## [1] 0.0062

This show the samples on avarage are pretty close to the value of the bin. To check the probability of this happening, we should make a histogram of the values obtained.

vals <- sapply(seq(1000), function(x) {
    sample.value()
})
ggplot(data.frame(vals = vals), aes(x = vals)) + geom_histogram(binwidth = 0.01, 
    colour = "black", fill = "white")
## Warning: position_stack requires constant width: output may be incorrect

plot of chunk histogram

We compute the bound Hoeffding gives us.

hoeffding <- function(prob, eps, sample.size) {
    2 * exp(-2 * eps^2 * sample.size)
}

prob <- 0.5  # probability of 1 in sample(c(1,0), size=1)
eps <- 0.1
(bound <- hoeffding(prob, eps, sample.size))
## [1] 0.2707

Now add the bounds to the histogram and note that as Hoeffding promises (since bound = 0.2707) there are not many results outside.

ggplot(data.frame(vals = vals), aes(x = vals)) + geom_histogram(binwidth = 0.01, 
    colour = "black", fill = "white") + geom_vline(xintercept = prob + eps) + 
    geom_vline(xintercept = prob - eps)
## Warning: position_stack requires constant width: output may be incorrect

plot of chunk hist2

Finally the check of the probability our simulation gives.

actual <- sum(vals < (prob - eps) | vals > (prob + eps))/1000

Note that indeed actual (0.043) is less than bound (0.2707).