Thompson Sampling

Thompson Sampling wikipedia
https://papers.nips.cc/paper/4909-eluder-dimension-and-the-sample-complexity-of-optimistic-exploration.pdf # Upper Confidence Bound - example of Reinforement Learning Intuition Lecture 172 https://www.udemy.com/machinelearning/learn/lecture/6456832

Part 1 Lecture 187
https://www.udemy.com/machinelearning/learn/lecture/6027262

Part 2 Lecture 188
https://www.udemy.com/machinelearning/learn/lecture/6027270

Beta Distirbution article
https://www.statisticshowto.datasciencecentral.com/beta-distribution/

check working directory getwd()

Importing the dataset

dataset = read.csv('Ads_CTR_Optimisation.csv')

UCB compared to Thompson Sampling
Lecture 183 https://www.udemy.com/machinelearning/learn/lecture/6468288

knitr::include_graphics("UCB_compare_ThompsonSampling.png")

UCB compared to Thompson Sampling

What’s this about: Python view of data.

knitr::include_graphics("Dataset_interpretation.png")

Pic from Python but same idea

Implementing RANDOM Selection - literally random for comparison

N = 10000
d = 10
ads_selected = integer(0)
total_reward = 0
for (n in 1:N) {
  ad = sample(1:10, 1)
  ads_selected = append(ads_selected, ad)
  reward = dataset[n, ad]
  total_reward = total_reward + reward
}

Let’s have a look at the total rewards calculated at random.

total_reward

## [1] 1274

Visualising the results - RANDOM

As we expect it’s Random!

hist(ads_selected,
     col = 'blue',
     main = 'Histogram of ads selections - RANDOM',
     xlab = 'Ads',
     ylab = 'Number of times each ad was selected')

Thompson Sampling Steps

knitr::include_graphics("Thompson_Sampling_Slide.png")

UCB steps from lecture

Implementing Thompson Sampling

N = 10000
d = 10
ads_selected = integer(0)
# UCB and Thompson Sampling algorithm are very similar but use different variables
# those variables are here
numbers_of_rewards_1 = integer(d) # the d defined above sets the initial as 10
numbers_of_rewards_0 = integer(d)
# These two variables will be put in place in the for loops
total_reward = 0
for (n in 1:N) {
  ad = 0
  max_random = 0
  for (i in 1:d) {
    random_beta = rbeta(n = 1,
                        shape1 = numbers_of_rewards_1[i] + 1,
                        shape2 = numbers_of_rewards_0[i] + 1)
    if (random_beta > max_random) {
      max_random = random_beta
      ad = i
    }
  }
  ads_selected = append(ads_selected, ad)
  reward = dataset[n, ad]
  if (reward == 1) {
    numbers_of_rewards_1[ad] = numbers_of_rewards_1[ad] + 1
  } else {
    numbers_of_rewards_0[ad] = numbers_of_rewards_0[ad] + 1
  }
  total_reward = total_reward + reward
}

Let’s have a look at the total rewards calculated with Thompson Sampling to compare to Random.

total_reward

## [1] 2578

Bayesian Inference

Describes the bits at the root of Thompson Sampling.
https://en.wikipedia.org/wiki/Bayesian_inference
https://brohrer.github.io/how_bayesian_inference_works.html

knitr::include_graphics("bayesian_inference.png")

UCB steps from lecture

Visulize the ads selected data

Let’s look at the last 1000 items to see which add is looking the most common.

tail(ads_selected, n = 1000)

##    [1]  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
##   [24]  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
##   [47]  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
##   [70]  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
##   [93]  5  4  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
##  [116]  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
##  [139]  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
##  [162]  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
##  [185]  5  5  5  5  5  5  5  5  5  5  2  5  5  5  5  5  5  5  5  5  5  5  5
##  [208]  5  5  5  8  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
##  [231]  5  5  5  3  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
##  [254]  5  5  5  5  5  5  5  2  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
##  [277]  5  8  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
##  [300]  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
##  [323]  5  5  5  5  5  5  9  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
##  [346]  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
##  [369]  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
##  [392]  5  5  5  5  5  5  5  5  5  5  8  5  5  5  5  1  5  5  5  5  5  5  5
##  [415]  2  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  8  5  5  5  5
##  [438]  5  5  5  5  5  5  5  5  5  5  9  5  5  5  5  5  5  5  5  5  5  5  5
##  [461]  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
##  [484]  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
##  [507]  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
##  [530]  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
##  [553]  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
##  [576]  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  8  5  5  5  5
##  [599]  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
##  [622]  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
##  [645]  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
##  [668]  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
##  [691]  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
##  [714]  5  5  5  5  5  5  8  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
##  [737]  5  5  5  5  5  5  5  5  5  5  1  5  5  5  5  5  5  5  5  5  5  5  5
##  [760]  5  1  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5 10
##  [783]  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
##  [806]  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
##  [829]  5  5  5  5  5  5  5  5  5  5  2  5  5  5  5  5  5  5  5  5  5  5  5
##  [852]  5  1  5  5  2  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
##  [875]  5  5  5  5  5  5  5  5  5  5  5  5  5  1  5  5  5  5  5  5  5  5  5
##  [898]  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
##  [921]  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
##  [944]  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5
##  [967]  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5  1
##  [990]  5  5  5  5  5  5  5  5  5  5  5

Visualising the results

hist(ads_selected,
     col = 'blue',
     main = 'Histogram of ads selections (Thompson Sampling R)',
     xlab = 'Ads',
     ylab = 'Number of times each ad was selected')

Great question: How is Thomson Sampling heuristic quickly able to find that 5th advertisement is the best one in comparison to the Upper Confidence Bound heuristic?
It is hard to explain the reason theoretically, that would require to do research and write a long mathematical proof. But intuitively, it could be because UCB is based on optimistic assumptions whereas Thompson Sampling is based on relevant probabilities through the Bayesian approach. =========================
Github files; https://github.com/ghettocounselor

Useful PDF for common questions in Lectures;
https://github.com/ghettocounselor/Machine_Learning/blob/master/Machine-Learning-A-Z-Q-A.pdf

Thompson Sampling implementation in R