Thompson Sampling wikipedia
https://papers.nips.cc/paper/4909-eluder-dimension-and-the-sample-complexity-of-optimistic-exploration.pdf # Upper Confidence Bound - example of Reinforement Learning Intuition Lecture 172 https://www.udemy.com/machinelearning/learn/lecture/6456832
Part 1 Lecture 187
https://www.udemy.com/machinelearning/learn/lecture/6027262
Part 2 Lecture 188
https://www.udemy.com/machinelearning/learn/lecture/6027270
Beta Distirbution article
https://www.statisticshowto.datasciencecentral.com/beta-distribution/
check working directory getwd()
dataset = read.csv('Ads_CTR_Optimisation.csv')
UCB compared to Thompson Sampling
Lecture 183 https://www.udemy.com/machinelearning/learn/lecture/6468288
knitr::include_graphics("UCB_compare_ThompsonSampling.png")
UCB compared to Thompson Sampling
What’s this about: Python view of data.
knitr::include_graphics("Dataset_interpretation.png")
Pic from Python but same idea
N = 10000
d = 10
ads_selected = integer(0)
total_reward = 0
for (n in 1:N) {
ad = sample(1:10, 1)
ads_selected = append(ads_selected, ad)
reward = dataset[n, ad]
total_reward = total_reward + reward
}
Let’s have a look at the total rewards calculated at random.
total_reward
## [1] 1274
As we expect it’s Random!
hist(ads_selected,
col = 'blue',
main = 'Histogram of ads selections - RANDOM',
xlab = 'Ads',
ylab = 'Number of times each ad was selected')
knitr::include_graphics("Thompson_Sampling_Slide.png")
UCB steps from lecture
N = 10000
d = 10
ads_selected = integer(0)
# UCB and Thompson Sampling algorithm are very similar but use different variables
# those variables are here
numbers_of_rewards_1 = integer(d) # the d defined above sets the initial as 10
numbers_of_rewards_0 = integer(d)
# These two variables will be put in place in the for loops
total_reward = 0
for (n in 1:N) {
ad = 0
max_random = 0
for (i in 1:d) {
random_beta = rbeta(n = 1,
shape1 = numbers_of_rewards_1[i] + 1,
shape2 = numbers_of_rewards_0[i] + 1)
if (random_beta > max_random) {
max_random = random_beta
ad = i
}
}
ads_selected = append(ads_selected, ad)
reward = dataset[n, ad]
if (reward == 1) {
numbers_of_rewards_1[ad] = numbers_of_rewards_1[ad] + 1
} else {
numbers_of_rewards_0[ad] = numbers_of_rewards_0[ad] + 1
}
total_reward = total_reward + reward
}
Let’s have a look at the total rewards calculated with Thompson Sampling to compare to Random.
total_reward
## [1] 2578
Describes the bits at the root of Thompson Sampling.
https://en.wikipedia.org/wiki/Bayesian_inference
https://brohrer.github.io/how_bayesian_inference_works.html
knitr::include_graphics("bayesian_inference.png")
UCB steps from lecture
Let’s look at the last 1000 items to see which add is looking the most common.
tail(ads_selected, n = 1000)
## [1] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [24] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [47] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [70] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [93] 5 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [116] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [139] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [162] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [185] 5 5 5 5 5 5 5 5 5 5 2 5 5 5 5 5 5 5 5 5 5 5 5
## [208] 5 5 5 8 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [231] 5 5 5 3 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [254] 5 5 5 5 5 5 5 2 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [277] 5 8 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [300] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [323] 5 5 5 5 5 5 9 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [346] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [369] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [392] 5 5 5 5 5 5 5 5 5 5 8 5 5 5 5 1 5 5 5 5 5 5 5
## [415] 2 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 8 5 5 5 5
## [438] 5 5 5 5 5 5 5 5 5 5 9 5 5 5 5 5 5 5 5 5 5 5 5
## [461] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [484] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [507] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [530] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [553] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [576] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 8 5 5 5 5
## [599] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [622] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [645] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [668] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [691] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [714] 5 5 5 5 5 5 8 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [737] 5 5 5 5 5 5 5 5 5 5 1 5 5 5 5 5 5 5 5 5 5 5 5
## [760] 5 1 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 10
## [783] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [806] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [829] 5 5 5 5 5 5 5 5 5 5 2 5 5 5 5 5 5 5 5 5 5 5 5
## [852] 5 1 5 5 2 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [875] 5 5 5 5 5 5 5 5 5 5 5 5 5 1 5 5 5 5 5 5 5 5 5
## [898] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [921] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [944] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
## [967] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 1
## [990] 5 5 5 5 5 5 5 5 5 5 5
hist(ads_selected,
col = 'blue',
main = 'Histogram of ads selections (Thompson Sampling R)',
xlab = 'Ads',
ylab = 'Number of times each ad was selected')
Great question: How is Thomson Sampling heuristic quickly able to find that 5th advertisement is the best one in comparison to the Upper Confidence Bound heuristic?
It is hard to explain the reason theoretically, that would require to do research and write a long mathematical proof. But intuitively, it could be because UCB is based on optimistic assumptions whereas Thompson Sampling is based on relevant probabilities through the Bayesian approach. =========================
Github files; https://github.com/ghettocounselor
Useful PDF for common questions in Lectures;
https://github.com/ghettocounselor/Machine_Learning/blob/master/Machine-Learning-A-Z-Q-A.pdf