Simulation of Hoeffding's inequality.
library(ggplot2)
bin.size <- 1000
bin <- sapply(seq(bin.size), function(x) {
sample(c(1, 0), size = 1)
})
bin.avg <- sum(bin)/bin.size
Now that we have a bin with “marbles” we can experiment with it. Take 100 times 10 marbles from the bin, and compute the average. Check that the average of the averages is close to the bin.avg.
iterations <- 100
sample.size <- 100
sample.value <- function() {
s <- bin[sample(seq(bin.size), sample.size)]
return(sum(s)/sample.size)
}
sample.values <- sapply(seq(iterations), function(x) {
sample.value()
})
abs(sum(sample.values)/iterations - bin.avg)
## [1] 0.0062
This show the samples on avarage are pretty close to the value of the bin. To check the probability of this happening, we should make a histogram of the values obtained.
vals <- sapply(seq(1000), function(x) {
sample.value()
})
ggplot(data.frame(vals = vals), aes(x = vals)) + geom_histogram(binwidth = 0.01,
colour = "black", fill = "white")
## Warning: position_stack requires constant width: output may be incorrect
We compute the bound Hoeffding gives us.
hoeffding <- function(prob, eps, sample.size) {
2 * exp(-2 * eps^2 * sample.size)
}
prob <- 0.5 # probability of 1 in sample(c(1,0), size=1)
eps <- 0.1
(bound <- hoeffding(prob, eps, sample.size))
## [1] 0.2707
Now add the bounds to the histogram and note that as Hoeffding promises (since bound = 0.2707) there are not many results outside.
ggplot(data.frame(vals = vals), aes(x = vals)) + geom_histogram(binwidth = 0.01,
colour = "black", fill = "white") + geom_vline(xintercept = prob + eps) +
geom_vline(xintercept = prob - eps)
## Warning: position_stack requires constant width: output may be incorrect
Finally the check of the probability our simulation gives.
actual <- sum(vals < (prob - eps) | vals > (prob + eps))/1000
Note that indeed actual (0.043) is less than bound (0.2707).