Due date

The due date for this exam is Friday, February 24, by 2:00PM. Late submissions will not be accepted apart from exceptional circumstances. Consequently, you should plan on submitting before the due date.

Instructions

This exam consists of 10 problems. The first 9 problems build on each other. Problem 10 consists of 5 parts, but can be completed without first solving problems 1–9. Each problem will be graded out of a maximum of 5 points.

For this exam you must provide all of your answers in a single file, either straight source code (.R), or an R markdown file (.Rmd), or a ‘knit’ markdown document. The R markdown document used to write this exam will be provided as a template. Regardless of what option you choose, you should delineate your answer for each question clearly. For example:

# ******************************************************************
# Problem 1)


# ******************************************************************
# Problem 2)
# [ Solution code goes here ]

etc.

If I run your source code (or knit your markdown file), it should run from beginning to end without producing any errors.

Your code should conform to the tidyverse R programming style guide, available here for reference:
https://style.tidyverse.org/index.html.

You can refer to your notes, class lecture slides, Google, StackExchange, or other online material. However, you cannot post content from the exam or questions related to it on the internet (Discord, etc.), or consult with other students. Anyone caught violating this policy will be given an immediate zero for the exam.

Partial credit will be given, so if you are unsure of a solution or can’t get your code to work, you should include concise comments in your code that explain your thought process/approach.

Introduction & Background

The following background comes from Gureckis, T. M., & Love, B. C. (2015). “Computational reinforcement learning”. The Oxford handbook of computational and mathematical psychology, 99-117:

There are few general laws of behavior, but one may be that humans and other animals tend to repeat behaviors that have led to positive outcomes in the past and avoid those associated with punishment or pain. Such tendencies are on display in the behavior of young children who learn to avoid touching hot stoves following a painful burn, but behave in school when rewarded with toys. This basic principle exerts such a powerful influence on behavior, it manifests throughout our culture and laws. Behaviors that society wants to discourage are tied to punishment (e.g., prison time, fines, consumption taxes), whereas behaviors society condones are tied to positive outcomes (e.g., tax credits for fuel-efficient cars).

The scientific study of how animals use experience to adapt their behavior in order maximize rewards is known as reinforcement learning (RL). Reinforcement learning differs from other types of learning behavior of interest to psychologists (e.g., unsupervised learning, supervised learning) since it deals with learning from feedback that is largely evaluative rather than corrective. A restaurant diner doesn’t necessarily learn that eating at a particular business is “wrong,” simply that the experience was less than exquisite. This particular aspect of RL – learning from evaluate rather than corrective feedback – makes it a particularly rich domain for studying how people adapt their behavior based on experience.

The history of RL can be traced to early work in behavioral psychology (Thorndike, 1911; Skinner, 1938). However, the modern field of RL is a highly interdisciplinary area at the crossroads of computer science, machine learning, psychology, and neuroscience. In particular, contemporary research on RL is characterized by detailed behavioral models that make predictions across a wide range of circumstances, as well as neuroscience findings that have linked aspects of these models to particular neural substrates. In many ways, RL today stands as one of the major triumphs of cognitive science in that it offers an integrated theory of behavior at the computational, algorithmic, and implementational (i.e., neural) levels (Marr, 1982).

Multi-armed bandits

For this exam, we will be exploring very simple models of human reinforcement learning. In particular, we will focus on learning in “multi-armed bandit” tasks. What is a multi-armed bandit?

You have probably heard of a slot machine. It’s a gambling device where you put in some money, pull a lever, and if you are lucky you win money. In Las Vegas (so the story goes), “one-armed bandit” is a slang term for a slot machine. One-armed, because the machine has a single lever that you pull. Bandit, because generally speaking it steals your money.

You can think of a multi-armed bandit as a row of slot machines. However, in the general case, each slot machine has a different payout rate: some machines are ‘luckier’ than others. Given a finite number of choices, the goal in this setting is to maximize your expected payout.

While abstract, multi-armed bandits are a useful analogy to a very large number of real-world scenarios. For example, medical doctors might have a choice of \(n\) different treatments available for a particular disease, but the effectiveness of each treatment varies and is not entirely known. Do you select a treatment that you are confident works moderately well, or do you try a different treatment that you don’t know as much about, but has the potential to be far more effective?

In machine learning, this tradeoff is known as the ‘exploration-exploitation’ dilemma. You need to explore new (and potentially suboptimal) options in order to learn about them, but you also need to exploit what you already know in order to maximize reward. You also navigate this tradeoff constantly in your daily life without realizing it. For example, do you go out to your favorite restaurant, or do you risk trying a new place that just opened up? Do you stay at your current job, where you might be unhappy but stable, or do you risk the unknown for the possibility of higher pay or more job satisfaction?

For more information about multi-armed bandits, see: https://en.wikipedia.org/wiki/Multi-armed_bandit

Multi-armed bandit setup

The code below provides a skeleton for a reinforcement learning experiment with a multi-armed bandit task. In this experiment, the learning agent faces a choice between 10 bandits on each trial. Each bandit provides a binary reward (either 0 or 1), but the probability of reward differs between the bandits. The goal for the learning agent is to maximize the total reward received.

On each trial, the agent selects one of the alternatives, and receives a randomly generated reward, with probability determined by the particular bandit they selected. Mathematically, let \(k \in \lbrace 1 \ldots 10\rbrace\) indicate the choice made on a given trial. \(r\) indicates the reward received on that trial, where

\[ r \sim \mathrm{Bernoulli}(\theta_k) \] and \(\theta\) is a vector of length 10 that defines the reward probability for each bandit. The following code provides a basic implementation of a 10-armed bandit task.

Listing 1: A basic 10-armed bandit task

# Include this just once at the top of your code
set.seed(42)

#New estimate = old_estimate + stepsize * (target - old estimate)

simulate_baseline_agent <- function(n_arms = 10, n_trials = 1000, alpha = 0.05) {
  bandit <- 1:10 #return this 
  # n_arms = Number of bandits to choose from on each trial  ˆ θ 
  # n_trials = Number of trials to simulate
  # step size = alpha (a)

  # Generate the true reward probability for each arm
  theta_true <- runif(n_arms) #return this 
  theta_est <- rep(0.5, 10) #return this
  
  for(i in 1:n_trials) {
    # Choose an action randomly- chooses the bandit randomly 
    k <- sample(1:n_arms, 1)
    
    # Generate a binary reward (0 or 1) according to the choice
    r <- as.numeric(runif(1) < theta_true[k]) # target 
    
    theta_est[k] <- theta_est[k] + alpha * (r - theta_est[k])
    
  }
  
  sol <- data.frame(bandit, theta_true, theta_est)
  
  return(sol)
  # This function doesn't return anything (yet)
}

table <- simulate_baseline_agent()
print(table)
##    bandit theta_true theta_est
## 1       1  0.9148060 0.8893462
## 2       2  0.9370754 0.8982559
## 3       3  0.2861395 0.3518885
## 4       4  0.8304476 0.7751360
## 5       5  0.6417455 0.5059686
## 6       6  0.5190959 0.4935130
## 7       7  0.7365883 0.6031983
## 8       8  0.1346666 0.1344954
## 9       9  0.6569923 0.5841384
## 10     10  0.7050648 0.6392469

Note that in the code above, there are two obvious limitations as a theory of human or animal learning:

  • The agent chooses actions completely at random between the 10 alternatives.
  • The agent doesn’t actually learn anything from the feedback that it receives.

As part of this exam, you will address each of these limitations.


Problem 1)

In this problem you will modify the code so that the agent learns from its feedback. In particular, we will implement a classic learning model called temporal difference learning or TD-learning.

Suppose the agent has an estimate for the value of each of the 10 bandits. Let’s call this estimate \(\hat{\theta}\). Note that this is actually a vector, so that \(\hat{\theta}_k\) represents the estimated value for the \(k\)-th alternative. On a particular trial, the agent selects alternative \(k\), and receives a reward \(r\) that is either 1 or 0. How should the agent update its beliefs about the value of alternative \(k\)?

According to the TD-learning rule, learning is driven by the difference between what the agent predicted, and what it observed. Mathematically, we have:

\[ \hat{\theta}_k \leftarrow \hat{\theta}_k + \alpha \left(r - \hat{\theta}_k\right) \]

In plain english, this says that the new estimate for the value is equal to the old estimate, plus a term proportional to the difference between what was observed and what was predicted, (\(r - \hat{\theta}_k\)). The parameter \(\alpha\) is called the learning rate. When \(\alpha = 0\), the term on the right cancels out and no learning occurs. When \(\alpha = 1\), the updated value is exactly equal to the most recent reward signal \(r\).

Using the code above as a starting point, create a new function called simulate_td_random(). The agent should update its beliefs about the value of each bandit, using the TD-learning rule.

Your function should take an additional argument, alpha which determines the learning rate for the agent. The default value for this argument should be specified as alpha = 0.05.

Your function should return a data frame that contains three columns:

Note: Your agent will still select actions randomly, but will learn on the basis of the reward signal.

Some specific requirements:

Solution:

# Your solution here

# Include this just once at the top of your code
set.seed(42)

#New estimate = old_estimate + stepsize * (target - old estimate)

simulate_td_random <- function(n_arms = 10, n_trials = 1000, alpha = 0.05) {
  bandit <- 1:10 #return this 
  # n_arms = Number of bandits to choose from on each trial  ˆ θ 
  # n_trials = Number of trials to simulate
  # step size = alpha (a)

  # Generate the true reward probability for each arm
  theta_true <- runif(n_arms) #return this 
  theta_est <- rep(0.5, 10) #return this
  
  for(i in 1:n_trials) {
    # Choose an action randomly- chooses the bandit randomly 
    k <- sample(1:n_arms, 1)
    
    # Generate a binary reward (0 or 1) according to the choice
    r <- as.numeric(runif(1) < theta_true[k]) # target 
    
    theta_est[k] <- theta_est[k] + alpha * (r - theta_est[k])
    
  }
  
  sol <- data.frame(bandit, theta_true, theta_est)
  
  return(sol)
  # This function doesn't return anything (yet)
}

table <- simulate_td_random()
print(table)
##    bandit theta_true theta_est
## 1       1  0.9148060 0.8893462
## 2       2  0.9370754 0.8982559
## 3       3  0.2861395 0.3518885
## 4       4  0.8304476 0.7751360
## 5       5  0.6417455 0.5059686
## 6       6  0.5190959 0.4935130
## 7       7  0.7365883 0.6031983
## 8       8  0.1346666 0.1344954
## 9       9  0.6569923 0.5841384
## 10     10  0.7050648 0.6392469

Problem 2)

Run your function from problem 1.

Generate a bar graph that shows the estimated value (estimated reward probability) for each bandit at the end of learning. Overlaid over the bars, also show plot markers that indicate the true reward probability for each bandit.

Specific requirements:

Solution:

library("tidyverse")
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   1.0.1 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.5.0 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
Bandit <- table$bandit
erp <- table$theta_est #estimated reward probability 

p <- ggplot() +
  geom_col(aes(x = Bandit, y = erp), fill = "#BB2649") + #values 
  labs(x = "Bandit", y = "Estimated Reward Probability") #labels 
  

print(p)


Problem 3)

Modify your function simulate_td_random() so that it keeps track of the total accumulated reward received by the agent at each trial. For example, if the agent receives a reward on trials 1, 3, and 5, then its total accumulated reward over the first five trials should be 1, 1, 2, 2, 3.

The updated function should return a data frame with three columns:

Solution:

#New estimate = old_estimate + stepsize * (target - old estimate)
set.seed(42) 

simulate_td_random <- function(n_arms = 10, n_trials = 1000, alpha = 0.05) {
  trial <- 1:n_trials #return this amount of trials
  reward <- rep(0,n_trials) #reward per round
  accumulated_reward <- rep(0, n_trials) #return this- stores cumulative reward values 
  
  # Generate the true reward probability for each arm
  theta_true <- runif(n_arms) #return this 
  theta_est <- rep(0.5, 10) #return this

  
  for(i in 1:n_trials) {
    # Choose an action randomly- chooses the bandit randomly 
    k <- sample(1:n_arms, 1)
    
    # Generate a binary reward (0 or 1) according to the choice
    r <- as.numeric(runif(1) < theta_true[k]) # target 
    
    theta_est[k] <- theta_est[k] + alpha * (r - theta_est[k])
    
    reward[i] <- r
    
    if (i == 1) { #chooses value of the first bandit
      accumulated_reward[i] <- r
    } 
    
    else if (i != 1) { #chooses values for the rest
      accumulated_reward[i] <- accumulated_reward[i-1] + r
    }
  }
  
  sol <- data.frame(trial, reward, accumulated_reward)
  
  return(sol)
  # This function doesn't return anything (yet)
}

reward_table <- simulate_td_random()
print(reward_table)
##      trial reward accumulated_reward
## 1        1      0                  0
## 2        2      1                  1
## 3        3      0                  1
## 4        4      1                  2
## 5        5      1                  3
## 6        6      0                  3
## 7        7      1                  4
## 8        8      1                  5
## 9        9      0                  5
## 10      10      1                  6
## 11      11      1                  7
## 12      12      1                  8
## 13      13      0                  8
## 14      14      0                  8
## 15      15      0                  8
## 16      16      1                  9
## 17      17      1                 10
## 18      18      1                 11
## 19      19      1                 12
## 20      20      1                 13
## 21      21      0                 13
## 22      22      1                 14
## 23      23      0                 14
## 24      24      0                 14
## 25      25      1                 15
## 26      26      0                 15
## 27      27      1                 16
## 28      28      1                 17
## 29      29      1                 18
## 30      30      1                 19
## 31      31      1                 20
## 32      32      1                 21
## 33      33      1                 22
## 34      34      1                 23
## 35      35      0                 23
## 36      36      0                 23
## 37      37      0                 23
## 38      38      1                 24
## 39      39      0                 24
## 40      40      1                 25
## 41      41      1                 26
## 42      42      0                 26
## 43      43      0                 26
## 44      44      1                 27
## 45      45      0                 27
## 46      46      1                 28
## 47      47      1                 29
## 48      48      0                 29
## 49      49      0                 29
## 50      50      0                 29
## 51      51      0                 29
## 52      52      0                 29
## 53      53      1                 30
## 54      54      0                 30
## 55      55      1                 31
## 56      56      1                 32
## 57      57      1                 33
## 58      58      0                 33
## 59      59      1                 34
## 60      60      1                 35
## 61      61      0                 35
## 62      62      0                 35
## 63      63      0                 35
## 64      64      1                 36
## 65      65      1                 37
## 66      66      1                 38
## 67      67      0                 38
## 68      68      1                 39
## 69      69      1                 40
## 70      70      1                 41
## 71      71      1                 42
## 72      72      0                 42
## 73      73      1                 43
## 74      74      0                 43
## 75      75      1                 44
## 76      76      0                 44
## 77      77      1                 45
## 78      78      1                 46
## 79      79      0                 46
## 80      80      1                 47
## 81      81      1                 48
## 82      82      1                 49
## 83      83      0                 49
## 84      84      0                 49
## 85      85      1                 50
## 86      86      1                 51
## 87      87      0                 51
## 88      88      1                 52
## 89      89      1                 53
## 90      90      1                 54
## 91      91      1                 55
## 92      92      1                 56
## 93      93      1                 57
## 94      94      1                 58
## 95      95      0                 58
## 96      96      1                 59
## 97      97      1                 60
## 98      98      1                 61
## 99      99      1                 62
## 100    100      1                 63
## 101    101      0                 63
## 102    102      1                 64
## 103    103      1                 65
## 104    104      0                 65
## 105    105      1                 66
## 106    106      1                 67
## 107    107      0                 67
## 108    108      1                 68
## 109    109      1                 69
## 110    110      0                 69
## 111    111      1                 70
## 112    112      1                 71
## 113    113      1                 72
## 114    114      1                 73
## 115    115      0                 73
## 116    116      1                 74
## 117    117      1                 75
## 118    118      0                 75
## 119    119      1                 76
## 120    120      0                 76
## 121    121      0                 76
## 122    122      1                 77
## 123    123      1                 78
## 124    124      0                 78
## 125    125      1                 79
## 126    126      0                 79
## 127    127      1                 80
## 128    128      1                 81
## 129    129      1                 82
## 130    130      1                 83
## 131    131      0                 83
## 132    132      0                 83
## 133    133      1                 84
## 134    134      1                 85
## 135    135      0                 85
## 136    136      0                 85
## 137    137      1                 86
## 138    138      0                 86
## 139    139      0                 86
## 140    140      1                 87
## 141    141      0                 87
## 142    142      1                 88
## 143    143      1                 89
## 144    144      0                 89
## 145    145      1                 90
## 146    146      1                 91
## 147    147      0                 91
## 148    148      1                 92
## 149    149      1                 93
## 150    150      0                 93
## 151    151      1                 94
## 152    152      1                 95
## 153    153      0                 95
## 154    154      1                 96
## 155    155      0                 96
## 156    156      0                 96
## 157    157      1                 97
## 158    158      1                 98
## 159    159      1                 99
## 160    160      1                100
## 161    161      1                101
## 162    162      1                102
## 163    163      0                102
## 164    164      1                103
## 165    165      0                103
## 166    166      1                104
## 167    167      0                104
## 168    168      0                104
## 169    169      1                105
## 170    170      0                105
## 171    171      1                106
## 172    172      1                107
## 173    173      1                108
## 174    174      1                109
## 175    175      0                109
## 176    176      1                110
## 177    177      1                111
## 178    178      1                112
## 179    179      1                113
## 180    180      0                113
## 181    181      1                114
## 182    182      0                114
## 183    183      1                115
## 184    184      1                116
## 185    185      1                117
## 186    186      1                118
## 187    187      1                119
## 188    188      1                120
## 189    189      0                120
## 190    190      0                120
## 191    191      0                120
## 192    192      1                121
## 193    193      1                122
## 194    194      1                123
## 195    195      1                124
## 196    196      0                124
## 197    197      0                124
## 198    198      1                125
## 199    199      1                126
## 200    200      1                127
## 201    201      1                128
## 202    202      0                128
## 203    203      1                129
## 204    204      0                129
## 205    205      0                129
## 206    206      1                130
## 207    207      1                131
## 208    208      1                132
## 209    209      1                133
## 210    210      1                134
## 211    211      0                134
## 212    212      1                135
## 213    213      1                136
## 214    214      1                137
## 215    215      1                138
## 216    216      1                139
## 217    217      1                140
## 218    218      1                141
## 219    219      1                142
## 220    220      1                143
## 221    221      0                143
## 222    222      1                144
## 223    223      1                145
## 224    224      1                146
## 225    225      1                147
## 226    226      0                147
## 227    227      1                148
## 228    228      0                148
## 229    229      0                148
## 230    230      0                148
## 231    231      1                149
## 232    232      1                150
## 233    233      0                150
## 234    234      1                151
## 235    235      1                152
## 236    236      1                153
## 237    237      0                153
## 238    238      0                153
## 239    239      0                153
## 240    240      1                154
## 241    241      0                154
## 242    242      0                154
## 243    243      1                155
## 244    244      1                156
## 245    245      0                156
## 246    246      0                156
## 247    247      0                156
## 248    248      0                156
## 249    249      1                157
## 250    250      1                158
## 251    251      1                159
## 252    252      1                160
## 253    253      0                160
## 254    254      1                161
## 255    255      1                162
## 256    256      1                163
## 257    257      1                164
## 258    258      0                164
## 259    259      0                164
## 260    260      1                165
## 261    261      1                166
## 262    262      0                166
## 263    263      0                166
## 264    264      1                167
## 265    265      0                167
## 266    266      1                168
## 267    267      1                169
## 268    268      0                169
## 269    269      1                170
## 270    270      0                170
## 271    271      1                171
## 272    272      1                172
## 273    273      0                172
## 274    274      1                173
## 275    275      0                173
## 276    276      1                174
## 277    277      0                174
## 278    278      0                174
## 279    279      0                174
## 280    280      1                175
## 281    281      0                175
## 282    282      0                175
## 283    283      0                175
## 284    284      1                176
## 285    285      1                177
## 286    286      0                177
## 287    287      0                177
## 288    288      1                178
## 289    289      1                179
## 290    290      1                180
## 291    291      1                181
## 292    292      0                181
## 293    293      1                182
## 294    294      0                182
## 295    295      1                183
## 296    296      1                184
## 297    297      1                185
## 298    298      1                186
## 299    299      0                186
## 300    300      1                187
## 301    301      1                188
## 302    302      0                188
## 303    303      1                189
## 304    304      0                189
## 305    305      1                190
## 306    306      1                191
## 307    307      0                191
## 308    308      1                192
## 309    309      1                193
## 310    310      1                194
## 311    311      1                195
## 312    312      1                196
## 313    313      1                197
## 314    314      1                198
## 315    315      0                198
## 316    316      1                199
## 317    317      1                200
## 318    318      1                201
## 319    319      0                201
## 320    320      1                202
## 321    321      1                203
## 322    322      1                204
## 323    323      1                205
## 324    324      1                206
## 325    325      1                207
## 326    326      0                207
## 327    327      0                207
## 328    328      1                208
## 329    329      0                208
## 330    330      1                209
## 331    331      1                210
## 332    332      1                211
## 333    333      0                211
## 334    334      1                212
## 335    335      1                213
## 336    336      0                213
## 337    337      1                214
## 338    338      0                214
## 339    339      1                215
## 340    340      1                216
## 341    341      0                216
## 342    342      1                217
## 343    343      1                218
## 344    344      1                219
## 345    345      1                220
## 346    346      1                221
## 347    347      0                221
## 348    348      1                222
## 349    349      1                223
## 350    350      0                223
## 351    351      1                224
## 352    352      0                224
## 353    353      0                224
## 354    354      1                225
## 355    355      1                226
## 356    356      1                227
## 357    357      0                227
## 358    358      1                228
## 359    359      1                229
## 360    360      1                230
## 361    361      1                231
## 362    362      1                232
## 363    363      1                233
## 364    364      1                234
## 365    365      1                235
## 366    366      1                236
## 367    367      0                236
## 368    368      0                236
## 369    369      1                237
## 370    370      0                237
## 371    371      1                238
## 372    372      0                238
## 373    373      1                239
## 374    374      1                240
## 375    375      1                241
## 376    376      1                242
## 377    377      1                243
## 378    378      1                244
## 379    379      1                245
## 380    380      1                246
## 381    381      0                246
## 382    382      0                246
## 383    383      1                247
## 384    384      1                248
## 385    385      0                248
## 386    386      0                248
## 387    387      1                249
## 388    388      1                250
## 389    389      1                251
## 390    390      0                251
## 391    391      0                251
## 392    392      1                252
## 393    393      0                252
## 394    394      1                253
## 395    395      0                253
## 396    396      0                253
## 397    397      1                254
## 398    398      1                255
## 399    399      0                255
## 400    400      1                256
## 401    401      1                257
## 402    402      1                258
## 403    403      1                259
## 404    404      1                260
## 405    405      0                260
## 406    406      0                260
## 407    407      1                261
## 408    408      1                262
## 409    409      0                262
## 410    410      1                263
## 411    411      0                263
## 412    412      1                264
## 413    413      1                265
## 414    414      1                266
## 415    415      0                266
## 416    416      1                267
## 417    417      1                268
## 418    418      1                269
## 419    419      1                270
## 420    420      1                271
## 421    421      0                271
## 422    422      0                271
## 423    423      0                271
## 424    424      1                272
## 425    425      1                273
## 426    426      1                274
## 427    427      1                275
## 428    428      1                276
## 429    429      0                276
## 430    430      1                277
## 431    431      0                277
## 432    432      1                278
## 433    433      1                279
## 434    434      1                280
## 435    435      0                280
## 436    436      0                280
## 437    437      0                280
## 438    438      1                281
## 439    439      1                282
## 440    440      0                282
## 441    441      1                283
## 442    442      1                284
## 443    443      0                284
## 444    444      1                285
## 445    445      1                286
## 446    446      1                287
## 447    447      1                288
## 448    448      0                288
## 449    449      0                288
## 450    450      1                289
## 451    451      1                290
## 452    452      0                290
## 453    453      1                291
## 454    454      1                292
## 455    455      1                293
## 456    456      0                293
## 457    457      1                294
## 458    458      1                295
## 459    459      0                295
## 460    460      1                296
## 461    461      1                297
## 462    462      1                298
## 463    463      1                299
## 464    464      0                299
## 465    465      1                300
## 466    466      0                300
## 467    467      1                301
## 468    468      0                301
## 469    469      0                301
## 470    470      0                301
## 471    471      0                301
## 472    472      1                302
## 473    473      0                302
## 474    474      1                303
## 475    475      0                303
## 476    476      0                303
## 477    477      1                304
## 478    478      1                305
## 479    479      0                305
## 480    480      1                306
## 481    481      0                306
## 482    482      1                307
## 483    483      1                308
## 484    484      0                308
## 485    485      0                308
## 486    486      1                309
## 487    487      0                309
## 488    488      1                310
## 489    489      0                310
## 490    490      0                310
## 491    491      0                310
## 492    492      1                311
## 493    493      1                312
## 494    494      1                313
## 495    495      0                313
## 496    496      1                314
## 497    497      1                315
## 498    498      0                315
## 499    499      1                316
## 500    500      1                317
## 501    501      1                318
## 502    502      1                319
## 503    503      0                319
## 504    504      1                320
## 505    505      1                321
## 506    506      1                322
## 507    507      0                322
## 508    508      1                323
## 509    509      1                324
## 510    510      1                325
## 511    511      1                326
## 512    512      0                326
## 513    513      0                326
## 514    514      1                327
## 515    515      1                328
## 516    516      0                328
## 517    517      1                329
## 518    518      1                330
## 519    519      0                330
## 520    520      0                330
## 521    521      0                330
## 522    522      0                330
## 523    523      0                330
## 524    524      0                330
## 525    525      1                331
## 526    526      0                331
## 527    527      1                332
## 528    528      1                333
## 529    529      1                334
## 530    530      0                334
## 531    531      1                335
## 532    532      0                335
## 533    533      1                336
## 534    534      1                337
## 535    535      1                338
## 536    536      0                338
## 537    537      1                339
## 538    538      1                340
## 539    539      1                341
## 540    540      1                342
## 541    541      0                342
## 542    542      0                342
## 543    543      1                343
## 544    544      1                344
## 545    545      0                344
## 546    546      0                344
## 547    547      1                345
## 548    548      1                346
## 549    549      1                347
## 550    550      1                348
## 551    551      1                349
## 552    552      1                350
## 553    553      1                351
## 554    554      1                352
## 555    555      0                352
## 556    556      1                353
## 557    557      0                353
## 558    558      0                353
## 559    559      1                354
## 560    560      0                354
## 561    561      0                354
## 562    562      0                354
## 563    563      1                355
## 564    564      0                355
## 565    565      0                355
## 566    566      1                356
## 567    567      0                356
## 568    568      0                356
## 569    569      0                356
## 570    570      0                356
## 571    571      0                356
## 572    572      0                356
## 573    573      1                357
## 574    574      1                358
## 575    575      1                359
## 576    576      1                360
## 577    577      0                360
## 578    578      1                361
## 579    579      1                362
## 580    580      0                362
## 581    581      0                362
## 582    582      0                362
## 583    583      1                363
## 584    584      0                363
## 585    585      0                363
## 586    586      0                363
## 587    587      0                363
## 588    588      1                364
## 589    589      0                364
## 590    590      1                365
## 591    591      1                366
## 592    592      0                366
## 593    593      1                367
## 594    594      1                368
## 595    595      1                369
## 596    596      1                370
## 597    597      0                370
## 598    598      0                370
## 599    599      1                371
## 600    600      0                371
## 601    601      0                371
## 602    602      0                371
## 603    603      0                371
## 604    604      0                371
## 605    605      1                372
## 606    606      0                372
## 607    607      1                373
## 608    608      1                374
## 609    609      0                374
## 610    610      0                374
## 611    611      1                375
## 612    612      0                375
## 613    613      1                376
## 614    614      0                376
## 615    615      0                376
## 616    616      1                377
## 617    617      0                377
## 618    618      0                377
## 619    619      1                378
## 620    620      1                379
## 621    621      0                379
## 622    622      1                380
## 623    623      1                381
## 624    624      1                382
## 625    625      0                382
## 626    626      1                383
## 627    627      1                384
## 628    628      0                384
## 629    629      1                385
## 630    630      1                386
## 631    631      1                387
## 632    632      0                387
## 633    633      1                388
## 634    634      0                388
## 635    635      1                389
## 636    636      0                389
## 637    637      0                389
## 638    638      0                389
## 639    639      1                390
## 640    640      0                390
## 641    641      0                390
## 642    642      1                391
## 643    643      1                392
## 644    644      1                393
## 645    645      0                393
## 646    646      1                394
## 647    647      0                394
## 648    648      1                395
## 649    649      1                396
## 650    650      0                396
## 651    651      0                396
## 652    652      1                397
## 653    653      0                397
## 654    654      0                397
## 655    655      0                397
## 656    656      0                397
## 657    657      0                397
## 658    658      1                398
## 659    659      1                399
## 660    660      1                400
## 661    661      1                401
## 662    662      0                401
## 663    663      1                402
## 664    664      1                403
## 665    665      0                403
## 666    666      1                404
## 667    667      0                404
## 668    668      1                405
## 669    669      0                405
## 670    670      0                405
## 671    671      1                406
## 672    672      0                406
## 673    673      1                407
## 674    674      1                408
## 675    675      0                408
## 676    676      1                409
## 677    677      1                410
## 678    678      1                411
## 679    679      1                412
## 680    680      1                413
## 681    681      0                413
## 682    682      1                414
## 683    683      1                415
## 684    684      1                416
## 685    685      1                417
## 686    686      1                418
## 687    687      1                419
## 688    688      1                420
## 689    689      1                421
## 690    690      1                422
## 691    691      1                423
## 692    692      1                424
## 693    693      0                424
## 694    694      1                425
## 695    695      1                426
## 696    696      1                427
## 697    697      1                428
## 698    698      1                429
## 699    699      0                429
## 700    700      1                430
## 701    701      1                431
## 702    702      1                432
## 703    703      1                433
## 704    704      1                434
## 705    705      1                435
## 706    706      0                435
## 707    707      1                436
## 708    708      1                437
## 709    709      0                437
## 710    710      1                438
## 711    711      1                439
## 712    712      1                440
## 713    713      1                441
## 714    714      0                441
## 715    715      0                441
## 716    716      1                442
## 717    717      1                443
## 718    718      1                444
## 719    719      0                444
## 720    720      0                444
## 721    721      0                444
## 722    722      0                444
## 723    723      0                444
## 724    724      1                445
## 725    725      0                445
## 726    726      1                446
## 727    727      1                447
## 728    728      0                447
## 729    729      0                447
## 730    730      0                447
## 731    731      1                448
## 732    732      1                449
## 733    733      1                450
## 734    734      0                450
## 735    735      1                451
## 736    736      1                452
## 737    737      1                453
## 738    738      0                453
## 739    739      1                454
## 740    740      1                455
## 741    741      1                456
## 742    742      1                457
## 743    743      0                457
## 744    744      1                458
## 745    745      0                458
## 746    746      0                458
## 747    747      0                458
## 748    748      1                459
## 749    749      1                460
## 750    750      0                460
## 751    751      1                461
## 752    752      0                461
## 753    753      1                462
## 754    754      1                463
## 755    755      1                464
## 756    756      1                465
## 757    757      1                466
## 758    758      1                467
## 759    759      1                468
## 760    760      1                469
## 761    761      0                469
## 762    762      0                469
## 763    763      1                470
## 764    764      1                471
## 765    765      0                471
## 766    766      1                472
## 767    767      1                473
## 768    768      0                473
## 769    769      1                474
## 770    770      0                474
## 771    771      0                474
## 772    772      0                474
## 773    773      0                474
## 774    774      1                475
## 775    775      0                475
## 776    776      0                475
## 777    777      0                475
## 778    778      0                475
## 779    779      1                476
## 780    780      0                476
## 781    781      1                477
## 782    782      1                478
## 783    783      1                479
## 784    784      0                479
## 785    785      1                480
## 786    786      1                481
## 787    787      0                481
## 788    788      0                481
## 789    789      0                481
## 790    790      1                482
## 791    791      1                483
## 792    792      1                484
## 793    793      0                484
## 794    794      1                485
## 795    795      1                486
## 796    796      0                486
## 797    797      0                486
## 798    798      0                486
## 799    799      0                486
## 800    800      1                487
## 801    801      1                488
## 802    802      1                489
## 803    803      0                489
## 804    804      1                490
## 805    805      0                490
## 806    806      1                491
## 807    807      1                492
## 808    808      1                493
## 809    809      0                493
## 810    810      1                494
## 811    811      1                495
## 812    812      1                496
## 813    813      0                496
## 814    814      1                497
## 815    815      0                497
## 816    816      1                498
## 817    817      0                498
## 818    818      0                498
## 819    819      0                498
## 820    820      1                499
## 821    821      1                500
## 822    822      1                501
## 823    823      1                502
## 824    824      0                502
## 825    825      0                502
## 826    826      0                502
## 827    827      1                503
## 828    828      1                504
## 829    829      0                504
## 830    830      1                505
## 831    831      0                505
## 832    832      1                506
## 833    833      1                507
## 834    834      0                507
## 835    835      0                507
## 836    836      0                507
## 837    837      1                508
## 838    838      1                509
## 839    839      1                510
## 840    840      1                511
## 841    841      1                512
## 842    842      1                513
## 843    843      0                513
## 844    844      1                514
## 845    845      1                515
## 846    846      0                515
## 847    847      1                516
## 848    848      1                517
## 849    849      0                517
## 850    850      1                518
## 851    851      1                519
## 852    852      1                520
## 853    853      1                521
## 854    854      1                522
## 855    855      1                523
## 856    856      1                524
## 857    857      1                525
## 858    858      1                526
## 859    859      0                526
## 860    860      1                527
## 861    861      0                527
## 862    862      1                528
## 863    863      1                529
## 864    864      1                530
## 865    865      1                531
## 866    866      0                531
## 867    867      1                532
## 868    868      0                532
## 869    869      1                533
## 870    870      1                534
## 871    871      1                535
## 872    872      0                535
## 873    873      1                536
## 874    874      1                537
## 875    875      0                537
## 876    876      0                537
## 877    877      1                538
## 878    878      0                538
## 879    879      1                539
## 880    880      0                539
## 881    881      1                540
## 882    882      0                540
## 883    883      1                541
## 884    884      1                542
## 885    885      1                543
## 886    886      1                544
## 887    887      0                544
## 888    888      1                545
## 889    889      0                545
## 890    890      0                545
## 891    891      1                546
## 892    892      0                546
## 893    893      0                546
## 894    894      1                547
## 895    895      1                548
## 896    896      1                549
## 897    897      0                549
## 898    898      0                549
## 899    899      0                549
## 900    900      0                549
## 901    901      0                549
## 902    902      0                549
## 903    903      1                550
## 904    904      0                550
## 905    905      1                551
## 906    906      0                551
## 907    907      1                552
## 908    908      1                553
## 909    909      1                554
## 910    910      0                554
## 911    911      1                555
## 912    912      0                555
## 913    913      1                556
## 914    914      0                556
## 915    915      1                557
## 916    916      1                558
## 917    917      0                558
## 918    918      1                559
## 919    919      1                560
## 920    920      0                560
## 921    921      1                561
## 922    922      1                562
## 923    923      0                562
## 924    924      0                562
## 925    925      0                562
## 926    926      1                563
## 927    927      1                564
## 928    928      1                565
## 929    929      0                565
## 930    930      0                565
## 931    931      1                566
## 932    932      0                566
## 933    933      0                566
## 934    934      1                567
## 935    935      0                567
## 936    936      0                567
## 937    937      0                567
## 938    938      1                568
## 939    939      0                568
## 940    940      1                569
## 941    941      1                570
## 942    942      0                570
## 943    943      1                571
## 944    944      1                572
## 945    945      1                573
## 946    946      1                574
## 947    947      0                574
## 948    948      0                574
## 949    949      0                574
## 950    950      1                575
## 951    951      1                576
## 952    952      0                576
## 953    953      1                577
## 954    954      1                578
## 955    955      0                578
## 956    956      0                578
## 957    957      1                579
## 958    958      1                580
## 959    959      0                580
## 960    960      0                580
## 961    961      1                581
## 962    962      0                581
## 963    963      1                582
## 964    964      1                583
## 965    965      1                584
## 966    966      1                585
## 967    967      0                585
## 968    968      1                586
## 969    969      0                586
## 970    970      0                586
## 971    971      1                587
## 972    972      1                588
## 973    973      1                589
## 974    974      1                590
## 975    975      1                591
## 976    976      0                591
## 977    977      1                592
## 978    978      1                593
## 979    979      0                593
## 980    980      0                593
## 981    981      1                594
## 982    982      1                595
## 983    983      1                596
## 984    984      1                597
## 985    985      1                598
## 986    986      1                599
## 987    987      1                600
## 988    988      1                601
## 989    989      0                601
## 990    990      1                602
## 991    991      1                603
## 992    992      0                603
## 993    993      0                603
## 994    994      0                603
## 995    995      1                604
## 996    996      1                605
## 997    997      0                605
## 998    998      1                606
## 999    999      0                606
## 1000  1000      1                607

Problem 4)

??? does this mean make a loop that runs this function 100 times OR does it mean to use 100 bandits instead of the default 10?

Run your function simulate_td_random() 100 times. Stack together the results into one big data frame with three columns, and 100,000 rows (1000 trials \(\times\) 100 simulations).

Once you’ve done that, assuming your results are stored in a variable called results you can use the following tidyverse magic to get the average accumulated reward:

avg_results <- results %>% 
  group_by(trial) %>%
  summarise(mean_accumulated_reward = mean(accumulated_reward))

Generate a line graph that shows how average accumulated reward increases over time.

Specific requirements:

Solution:

library(ggplot2)
#set.seed(42) 

#more_results <- length(100)
more_results <- simulate_td_random(n_arms = 10, n_trials = 1000, alpha = 0.05)
#colnames(more_results, c('Result 1','Result 2','Result 3'))

for(i in 1:99) {
  g = simulate_td_random(n_arms = 10, n_trials = 1000, alpha = 0.05)
  rbind(g, more_results)
}

print(more_results)
##      trial reward accumulated_reward
## 1        1      1                  1
## 2        2      1                  2
## 3        3      0                  2
## 4        4      1                  3
## 5        5      0                  3
## 6        6      0                  3
## 7        7      1                  4
## 8        8      0                  4
## 9        9      0                  4
## 10      10      1                  5
## 11      11      0                  5
## 12      12      0                  5
## 13      13      1                  6
## 14      14      0                  6
## 15      15      1                  7
## 16      16      0                  7
## 17      17      0                  7
## 18      18      1                  8
## 19      19      1                  9
## 20      20      1                 10
## 21      21      1                 11
## 22      22      0                 11
## 23      23      1                 12
## 24      24      0                 12
## 25      25      0                 12
## 26      26      1                 13
## 27      27      1                 14
## 28      28      0                 14
## 29      29      0                 14
## 30      30      1                 15
## 31      31      1                 16
## 32      32      0                 16
## 33      33      1                 17
## 34      34      1                 18
## 35      35      1                 19
## 36      36      0                 19
## 37      37      1                 20
## 38      38      1                 21
## 39      39      1                 22
## 40      40      1                 23
## 41      41      0                 23
## 42      42      0                 23
## 43      43      0                 23
## 44      44      1                 24
## 45      45      0                 24
## 46      46      0                 24
## 47      47      0                 24
## 48      48      1                 25
## 49      49      0                 25
## 50      50      0                 25
## 51      51      0                 25
## 52      52      1                 26
## 53      53      1                 27
## 54      54      0                 27
## 55      55      0                 27
## 56      56      0                 27
## 57      57      1                 28
## 58      58      0                 28
## 59      59      1                 29
## 60      60      0                 29
## 61      61      1                 30
## 62      62      0                 30
## 63      63      1                 31
## 64      64      0                 31
## 65      65      0                 31
## 66      66      1                 32
## 67      67      0                 32
## 68      68      1                 33
## 69      69      1                 34
## 70      70      1                 35
## 71      71      1                 36
## 72      72      0                 36
## 73      73      0                 36
## 74      74      0                 36
## 75      75      0                 36
## 76      76      1                 37
## 77      77      0                 37
## 78      78      1                 38
## 79      79      1                 39
## 80      80      1                 40
## 81      81      1                 41
## 82      82      1                 42
## 83      83      0                 42
## 84      84      0                 42
## 85      85      1                 43
## 86      86      1                 44
## 87      87      1                 45
## 88      88      0                 45
## 89      89      1                 46
## 90      90      1                 47
## 91      91      1                 48
## 92      92      1                 49
## 93      93      0                 49
## 94      94      0                 49
## 95      95      0                 49
## 96      96      1                 50
## 97      97      1                 51
## 98      98      1                 52
## 99      99      1                 53
## 100    100      0                 53
## 101    101      1                 54
## 102    102      0                 54
## 103    103      1                 55
## 104    104      1                 56
## 105    105      0                 56
## 106    106      1                 57
## 107    107      1                 58
## 108    108      1                 59
## 109    109      0                 59
## 110    110      1                 60
## 111    111      1                 61
## 112    112      0                 61
## 113    113      0                 61
## 114    114      0                 61
## 115    115      1                 62
## 116    116      1                 63
## 117    117      1                 64
## 118    118      0                 64
## 119    119      0                 64
## 120    120      0                 64
## 121    121      0                 64
## 122    122      1                 65
## 123    123      0                 65
## 124    124      1                 66
## 125    125      0                 66
## 126    126      0                 66
## 127    127      1                 67
## 128    128      1                 68
## 129    129      1                 69
## 130    130      0                 69
## 131    131      0                 69
## 132    132      0                 69
## 133    133      1                 70
## 134    134      0                 70
## 135    135      0                 70
## 136    136      1                 71
## 137    137      0                 71
## 138    138      1                 72
## 139    139      1                 73
## 140    140      0                 73
## 141    141      0                 73
## 142    142      0                 73
## 143    143      1                 74
## 144    144      1                 75
## 145    145      1                 76
## 146    146      1                 77
## 147    147      0                 77
## 148    148      1                 78
## 149    149      1                 79
## 150    150      1                 80
## 151    151      1                 81
## 152    152      0                 81
## 153    153      1                 82
## 154    154      0                 82
## 155    155      1                 83
## 156    156      1                 84
## 157    157      0                 84
## 158    158      1                 85
## 159    159      1                 86
## 160    160      1                 87
## 161    161      1                 88
## 162    162      0                 88
## 163    163      0                 88
## 164    164      1                 89
## 165    165      1                 90
## 166    166      0                 90
## 167    167      0                 90
## 168    168      1                 91
## 169    169      1                 92
## 170    170      0                 92
## 171    171      1                 93
## 172    172      0                 93
## 173    173      0                 93
## 174    174      1                 94
## 175    175      1                 95
## 176    176      1                 96
## 177    177      1                 97
## 178    178      1                 98
## 179    179      1                 99
## 180    180      0                 99
## 181    181      0                 99
## 182    182      1                100
## 183    183      1                101
## 184    184      1                102
## 185    185      0                102
## 186    186      0                102
## 187    187      1                103
## 188    188      1                104
## 189    189      1                105
## 190    190      0                105
## 191    191      0                105
## 192    192      1                106
## 193    193      1                107
## 194    194      0                107
## 195    195      0                107
## 196    196      0                107
## 197    197      0                107
## 198    198      0                107
## 199    199      1                108
## 200    200      0                108
## 201    201      0                108
## 202    202      0                108
## 203    203      1                109
## 204    204      1                110
## 205    205      0                110
## 206    206      1                111
## 207    207      1                112
## 208    208      0                112
## 209    209      0                112
## 210    210      0                112
## 211    211      1                113
## 212    212      1                114
## 213    213      1                115
## 214    214      1                116
## 215    215      1                117
## 216    216      0                117
## 217    217      1                118
## 218    218      1                119
## 219    219      0                119
## 220    220      1                120
## 221    221      0                120
## 222    222      0                120
## 223    223      0                120
## 224    224      0                120
## 225    225      0                120
## 226    226      1                121
## 227    227      1                122
## 228    228      1                123
## 229    229      1                124
## 230    230      1                125
## 231    231      0                125
## 232    232      1                126
## 233    233      0                126
## 234    234      0                126
## 235    235      1                127
## 236    236      1                128
## 237    237      1                129
## 238    238      1                130
## 239    239      1                131
## 240    240      0                131
## 241    241      1                132
## 242    242      0                132
## 243    243      0                132
## 244    244      1                133
## 245    245      0                133
## 246    246      1                134
## 247    247      0                134
## 248    248      1                135
## 249    249      1                136
## 250    250      1                137
## 251    251      0                137
## 252    252      0                137
## 253    253      1                138
## 254    254      0                138
## 255    255      0                138
## 256    256      0                138
## 257    257      1                139
## 258    258      0                139
## 259    259      1                140
## 260    260      0                140
## 261    261      1                141
## 262    262      0                141
## 263    263      0                141
## 264    264      1                142
## 265    265      1                143
## 266    266      0                143
## 267    267      0                143
## 268    268      1                144
## 269    269      0                144
## 270    270      1                145
## 271    271      1                146
## 272    272      1                147
## 273    273      1                148
## 274    274      0                148
## 275    275      0                148
## 276    276      0                148
## 277    277      0                148
## 278    278      1                149
## 279    279      0                149
## 280    280      0                149
## 281    281      1                150
## 282    282      0                150
## 283    283      1                151
## 284    284      0                151
## 285    285      0                151
## 286    286      0                151
## 287    287      0                151
## 288    288      0                151
## 289    289      1                152
## 290    290      0                152
## 291    291      0                152
## 292    292      1                153
## 293    293      0                153
## 294    294      1                154
## 295    295      1                155
## 296    296      0                155
## 297    297      1                156
## 298    298      1                157
## 299    299      0                157
## 300    300      0                157
## 301    301      1                158
## 302    302      1                159
## 303    303      0                159
## 304    304      1                160
## 305    305      1                161
## 306    306      0                161
## 307    307      1                162
## 308    308      0                162
## 309    309      0                162
## 310    310      1                163
## 311    311      1                164
## 312    312      1                165
## 313    313      0                165
## 314    314      0                165
## 315    315      0                165
## 316    316      1                166
## 317    317      0                166
## 318    318      1                167
## 319    319      0                167
## 320    320      0                167
## 321    321      0                167
## 322    322      0                167
## 323    323      1                168
## 324    324      0                168
## 325    325      1                169
## 326    326      1                170
## 327    327      1                171
## 328    328      0                171
## 329    329      1                172
## 330    330      1                173
## 331    331      0                173
## 332    332      0                173
## 333    333      1                174
## 334    334      0                174
## 335    335      1                175
## 336    336      0                175
## 337    337      0                175
## 338    338      1                176
## 339    339      0                176
## 340    340      0                176
## 341    341      1                177
## 342    342      1                178
## 343    343      0                178
## 344    344      0                178
## 345    345      1                179
## 346    346      1                180
## 347    347      1                181
## 348    348      1                182
## 349    349      1                183
## 350    350      0                183
## 351    351      1                184
## 352    352      1                185
## 353    353      0                185
## 354    354      0                185
## 355    355      0                185
## 356    356      1                186
## 357    357      1                187
## 358    358      1                188
## 359    359      0                188
## 360    360      1                189
## 361    361      0                189
## 362    362      1                190
## 363    363      1                191
## 364    364      1                192
## 365    365      0                192
## 366    366      1                193
## 367    367      0                193
## 368    368      1                194
## 369    369      0                194
## 370    370      1                195
## 371    371      1                196
## 372    372      1                197
## 373    373      0                197
## 374    374      0                197
## 375    375      1                198
## 376    376      1                199
## 377    377      0                199
## 378    378      0                199
## 379    379      0                199
## 380    380      1                200
## 381    381      1                201
## 382    382      1                202
## 383    383      1                203
## 384    384      0                203
## 385    385      1                204
## 386    386      1                205
## 387    387      1                206
## 388    388      0                206
## 389    389      0                206
## 390    390      1                207
## 391    391      1                208
## 392    392      0                208
## 393    393      1                209
## 394    394      0                209
## 395    395      1                210
## 396    396      1                211
## 397    397      0                211
## 398    398      1                212
## 399    399      1                213
## 400    400      1                214
## 401    401      0                214
## 402    402      0                214
## 403    403      1                215
## 404    404      1                216
## 405    405      1                217
## 406    406      1                218
## 407    407      1                219
## 408    408      1                220
## 409    409      1                221
## 410    410      0                221
## 411    411      1                222
## 412    412      0                222
## 413    413      0                222
## 414    414      0                222
## 415    415      0                222
## 416    416      0                222
## 417    417      1                223
## 418    418      1                224
## 419    419      0                224
## 420    420      1                225
## 421    421      1                226
## 422    422      0                226
## 423    423      1                227
## 424    424      0                227
## 425    425      0                227
## 426    426      0                227
## 427    427      0                227
## 428    428      0                227
## 429    429      1                228
## 430    430      1                229
## 431    431      0                229
## 432    432      0                229
## 433    433      1                230
## 434    434      0                230
## 435    435      1                231
## 436    436      1                232
## 437    437      0                232
## 438    438      0                232
## 439    439      0                232
## 440    440      1                233
## 441    441      1                234
## 442    442      0                234
## 443    443      0                234
## 444    444      1                235
## 445    445      0                235
## 446    446      0                235
## 447    447      1                236
## 448    448      1                237
## 449    449      0                237
## 450    450      0                237
## 451    451      1                238
## 452    452      0                238
## 453    453      0                238
## 454    454      0                238
## 455    455      0                238
## 456    456      0                238
## 457    457      1                239
## 458    458      0                239
## 459    459      0                239
## 460    460      1                240
## 461    461      0                240
## 462    462      0                240
## 463    463      0                240
## 464    464      0                240
## 465    465      1                241
## 466    466      0                241
## 467    467      1                242
## 468    468      0                242
## 469    469      0                242
## 470    470      1                243
## 471    471      1                244
## 472    472      1                245
## 473    473      1                246
## 474    474      0                246
## 475    475      1                247
## 476    476      0                247
## 477    477      1                248
## 478    478      1                249
## 479    479      0                249
## 480    480      0                249
## 481    481      1                250
## 482    482      1                251
## 483    483      1                252
## 484    484      0                252
## 485    485      1                253
## 486    486      1                254
## 487    487      0                254
## 488    488      1                255
## 489    489      1                256
## 490    490      1                257
## 491    491      0                257
## 492    492      1                258
## 493    493      0                258
## 494    494      1                259
## 495    495      0                259
## 496    496      1                260
## 497    497      1                261
## 498    498      1                262
## 499    499      0                262
## 500    500      0                262
## 501    501      1                263
## 502    502      1                264
## 503    503      1                265
## 504    504      0                265
## 505    505      1                266
## 506    506      0                266
## 507    507      1                267
## 508    508      1                268
## 509    509      1                269
## 510    510      0                269
## 511    511      1                270
## 512    512      1                271
## 513    513      1                272
## 514    514      0                272
## 515    515      1                273
## 516    516      1                274
## 517    517      1                275
## 518    518      1                276
## 519    519      1                277
## 520    520      0                277
## 521    521      1                278
## 522    522      1                279
## 523    523      0                279
## 524    524      1                280
## 525    525      1                281
## 526    526      1                282
## 527    527      0                282
## 528    528      1                283
## 529    529      1                284
## 530    530      1                285
## 531    531      1                286
## 532    532      0                286
## 533    533      0                286
## 534    534      0                286
## 535    535      0                286
## 536    536      1                287
## 537    537      1                288
## 538    538      0                288
## 539    539      1                289
## 540    540      0                289
## 541    541      0                289
## 542    542      0                289
## 543    543      0                289
## 544    544      1                290
## 545    545      1                291
## 546    546      0                291
## 547    547      0                291
## 548    548      1                292
## 549    549      1                293
## 550    550      0                293
## 551    551      0                293
## 552    552      1                294
## 553    553      1                295
## 554    554      1                296
## 555    555      0                296
## 556    556      0                296
## 557    557      0                296
## 558    558      0                296
## 559    559      1                297
## 560    560      1                298
## 561    561      1                299
## 562    562      1                300
## 563    563      1                301
## 564    564      0                301
## 565    565      1                302
## 566    566      0                302
## 567    567      1                303
## 568    568      0                303
## 569    569      1                304
## 570    570      1                305
## 571    571      1                306
## 572    572      0                306
## 573    573      1                307
## 574    574      1                308
## 575    575      1                309
## 576    576      0                309
## 577    577      1                310
## 578    578      1                311
## 579    579      1                312
## 580    580      1                313
## 581    581      1                314
## 582    582      1                315
## 583    583      0                315
## 584    584      1                316
## 585    585      0                316
## 586    586      1                317
## 587    587      0                317
## 588    588      1                318
## 589    589      0                318
## 590    590      0                318
## 591    591      0                318
## 592    592      0                318
## 593    593      1                319
## 594    594      1                320
## 595    595      1                321
## 596    596      0                321
## 597    597      1                322
## 598    598      1                323
## 599    599      0                323
## 600    600      0                323
## 601    601      1                324
## 602    602      0                324
## 603    603      0                324
## 604    604      1                325
## 605    605      0                325
## 606    606      1                326
## 607    607      1                327
## 608    608      1                328
## 609    609      0                328
## 610    610      1                329
## 611    611      0                329
## 612    612      0                329
## 613    613      0                329
## 614    614      1                330
## 615    615      1                331
## 616    616      1                332
## 617    617      0                332
## 618    618      0                332
## 619    619      1                333
## 620    620      0                333
## 621    621      1                334
## 622    622      1                335
## 623    623      0                335
## 624    624      0                335
## 625    625      1                336
## 626    626      0                336
## 627    627      1                337
## 628    628      0                337
## 629    629      0                337
## 630    630      0                337
## 631    631      1                338
## 632    632      0                338
## 633    633      1                339
## 634    634      0                339
## 635    635      0                339
## 636    636      1                340
## 637    637      1                341
## 638    638      1                342
## 639    639      1                343
## 640    640      0                343
## 641    641      1                344
## 642    642      0                344
## 643    643      0                344
## 644    644      0                344
## 645    645      0                344
## 646    646      1                345
## 647    647      1                346
## 648    648      1                347
## 649    649      0                347
## 650    650      1                348
## 651    651      1                349
## 652    652      0                349
## 653    653      0                349
## 654    654      0                349
## 655    655      0                349
## 656    656      1                350
## 657    657      0                350
## 658    658      0                350
## 659    659      1                351
## 660    660      1                352
## 661    661      0                352
## 662    662      1                353
## 663    663      1                354
## 664    664      1                355
## 665    665      0                355
## 666    666      0                355
## 667    667      0                355
## 668    668      0                355
## 669    669      1                356
## 670    670      0                356
## 671    671      1                357
## 672    672      0                357
## 673    673      0                357
## 674    674      1                358
## 675    675      1                359
## 676    676      1                360
## 677    677      0                360
## 678    678      0                360
## 679    679      0                360
## 680    680      1                361
## 681    681      0                361
## 682    682      0                361
## 683    683      1                362
## 684    684      0                362
## 685    685      0                362
## 686    686      1                363
## 687    687      1                364
## 688    688      1                365
## 689    689      0                365
## 690    690      0                365
## 691    691      0                365
## 692    692      1                366
## 693    693      0                366
## 694    694      1                367
## 695    695      1                368
## 696    696      0                368
## 697    697      0                368
## 698    698      0                368
## 699    699      0                368
## 700    700      0                368
## 701    701      0                368
## 702    702      0                368
## 703    703      0                368
## 704    704      0                368
## 705    705      1                369
## 706    706      1                370
## 707    707      0                370
## 708    708      1                371
## 709    709      0                371
## 710    710      0                371
## 711    711      0                371
## 712    712      1                372
## 713    713      0                372
## 714    714      0                372
## 715    715      1                373
## 716    716      0                373
## 717    717      0                373
## 718    718      0                373
## 719    719      0                373
## 720    720      1                374
## 721    721      0                374
## 722    722      1                375
## 723    723      1                376
## 724    724      1                377
## 725    725      1                378
## 726    726      0                378
## 727    727      0                378
## 728    728      1                379
## 729    729      1                380
## 730    730      1                381
## 731    731      1                382
## 732    732      0                382
## 733    733      0                382
## 734    734      0                382
## 735    735      0                382
## 736    736      1                383
## 737    737      1                384
## 738    738      1                385
## 739    739      1                386
## 740    740      0                386
## 741    741      0                386
## 742    742      0                386
## 743    743      1                387
## 744    744      1                388
## 745    745      1                389
## 746    746      0                389
## 747    747      1                390
## 748    748      0                390
## 749    749      1                391
## 750    750      1                392
## 751    751      1                393
## 752    752      1                394
## 753    753      1                395
## 754    754      0                395
## 755    755      1                396
## 756    756      1                397
## 757    757      1                398
## 758    758      0                398
## 759    759      0                398
## 760    760      1                399
## 761    761      0                399
## 762    762      1                400
## 763    763      1                401
## 764    764      0                401
## 765    765      0                401
## 766    766      0                401
## 767    767      1                402
## 768    768      0                402
## 769    769      1                403
## 770    770      0                403
## 771    771      0                403
## 772    772      1                404
## 773    773      0                404
## 774    774      1                405
## 775    775      0                405
## 776    776      0                405
## 777    777      1                406
## 778    778      0                406
## 779    779      1                407
## 780    780      1                408
## 781    781      0                408
## 782    782      0                408
## 783    783      0                408
## 784    784      1                409
## 785    785      1                410
## 786    786      0                410
## 787    787      1                411
## 788    788      0                411
## 789    789      1                412
## 790    790      1                413
## 791    791      1                414
## 792    792      1                415
## 793    793      1                416
## 794    794      1                417
## 795    795      1                418
## 796    796      0                418
## 797    797      0                418
## 798    798      0                418
## 799    799      1                419
## 800    800      1                420
## 801    801      1                421
## 802    802      0                421
## 803    803      0                421
## 804    804      0                421
## 805    805      1                422
## 806    806      1                423
## 807    807      1                424
## 808    808      0                424
## 809    809      1                425
## 810    810      1                426
## 811    811      1                427
## 812    812      0                427
## 813    813      0                427
## 814    814      0                427
## 815    815      0                427
## 816    816      1                428
## 817    817      1                429
## 818    818      1                430
## 819    819      0                430
## 820    820      0                430
## 821    821      0                430
## 822    822      1                431
## 823    823      0                431
## 824    824      0                431
## 825    825      1                432
## 826    826      0                432
## 827    827      0                432
## 828    828      1                433
## 829    829      1                434
## 830    830      0                434
## 831    831      1                435
## 832    832      0                435
## 833    833      0                435
## 834    834      0                435
## 835    835      1                436
## 836    836      0                436
## 837    837      0                436
## 838    838      1                437
## 839    839      0                437
## 840    840      0                437
## 841    841      1                438
## 842    842      1                439
## 843    843      1                440
## 844    844      1                441
## 845    845      1                442
## 846    846      1                443
## 847    847      0                443
## 848    848      1                444
## 849    849      1                445
## 850    850      0                445
## 851    851      1                446
## 852    852      0                446
## 853    853      1                447
## 854    854      1                448
## 855    855      1                449
## 856    856      1                450
## 857    857      0                450
## 858    858      1                451
## 859    859      1                452
## 860    860      0                452
## 861    861      1                453
## 862    862      1                454
## 863    863      0                454
## 864    864      1                455
## 865    865      0                455
## 866    866      0                455
## 867    867      0                455
## 868    868      1                456
## 869    869      1                457
## 870    870      1                458
## 871    871      1                459
## 872    872      0                459
## 873    873      0                459
## 874    874      1                460
## 875    875      1                461
## 876    876      0                461
## 877    877      1                462
## 878    878      1                463
## 879    879      1                464
## 880    880      1                465
## 881    881      1                466
## 882    882      1                467
## 883    883      1                468
## 884    884      0                468
## 885    885      0                468
## 886    886      0                468
## 887    887      1                469
## 888    888      0                469
## 889    889      0                469
## 890    890      1                470
## 891    891      0                470
## 892    892      1                471
## 893    893      1                472
## 894    894      0                472
## 895    895      0                472
## 896    896      1                473
## 897    897      1                474
## 898    898      0                474
## 899    899      0                474
## 900    900      1                475
## 901    901      0                475
## 902    902      1                476
## 903    903      1                477
## 904    904      0                477
## 905    905      0                477
## 906    906      1                478
## 907    907      1                479
## 908    908      0                479
## 909    909      1                480
## 910    910      1                481
## 911    911      0                481
## 912    912      0                481
## 913    913      0                481
## 914    914      0                481
## 915    915      1                482
## 916    916      1                483
## 917    917      1                484
## 918    918      1                485
## 919    919      0                485
## 920    920      1                486
## 921    921      1                487
## 922    922      0                487
## 923    923      1                488
## 924    924      0                488
## 925    925      1                489
## 926    926      0                489
## 927    927      1                490
## 928    928      1                491
## 929    929      1                492
## 930    930      1                493
## 931    931      0                493
## 932    932      1                494
## 933    933      0                494
## 934    934      0                494
## 935    935      0                494
## 936    936      0                494
## 937    937      1                495
## 938    938      0                495
## 939    939      0                495
## 940    940      1                496
## 941    941      0                496
## 942    942      0                496
## 943    943      0                496
## 944    944      0                496
## 945    945      1                497
## 946    946      0                497
## 947    947      0                497
## 948    948      0                497
## 949    949      0                497
## 950    950      0                497
## 951    951      0                497
## 952    952      1                498
## 953    953      1                499
## 954    954      1                500
## 955    955      1                501
## 956    956      0                501
## 957    957      1                502
## 958    958      0                502
## 959    959      1                503
## 960    960      0                503
## 961    961      0                503
## 962    962      0                503
## 963    963      1                504
## 964    964      1                505
## 965    965      1                506
## 966    966      1                507
## 967    967      1                508
## 968    968      1                509
## 969    969      0                509
## 970    970      0                509
## 971    971      1                510
## 972    972      0                510
## 973    973      1                511
## 974    974      0                511
## 975    975      1                512
## 976    976      1                513
## 977    977      0                513
## 978    978      0                513
## 979    979      1                514
## 980    980      0                514
## 981    981      0                514
## 982    982      1                515
## 983    983      0                515
## 984    984      1                516
## 985    985      0                516
## 986    986      0                516
## 987    987      0                516
## 988    988      0                516
## 989    989      1                517
## 990    990      1                518
## 991    991      1                519
## 992    992      1                520
## 993    993      0                520
## 994    994      1                521
## 995    995      1                522
## 996    996      0                522
## 997    997      1                523
## 998    998      1                524
## 999    999      1                525
## 1000  1000      1                526
#results <- simulate_td_random(n_arms = 10, n_trials = 1000, alpha = 0.05)

trial <- more_results$trial
reward <- more_results$reward

ggplot(more_results, aes(x = trial, y = reward)) +
  geom_line() +
  xlab("Trial") +
  ylab("Mean accumulated reward")


Problem 5)

Notice that so far, your agent is choosing its actions at random —it is exploring, but not exploiting what is has learned. In the reinforcement learning literature, extensive research has gone into how to optimally balance exploration and exploitation, as well as how best to model this tradeoff in human learning. We will consider a simple heuristic approach, called \(\epsilon\)-greedy action selection (\(\epsilon\) is the Greek letter epsilon). The idea is simple:

With probability \(\epsilon\), choose an action at random, and with probability \((1 - \epsilon)\) choose the action that currently has the highest estimated value.

Create a function called `simulate_td_eps()’, that uses TD-learning and \(\epsilon\)-greedy action selection.

Note that in the case of a tie (several alternatives have the highest value), you should choose randomly between the tied options.

Try to find a value for \(\epsilon\) that maximizes the agent’s performance (you can just do this through trial and error, a complex search for the exact optimal value is not needed).

Update your graph from problem 3, to show data for both the random action selection and \(\epsilon\)-greedy action selection mechanism (using average performance over 100 simulations for each algorithm.)

Additional requirements:

Solution:

simulate_td_eps <- function(steps, epsilon) {
  
  
}

Problem 6)

\(\epsilon\)-greedy is just one possible approach to balancing exploration and exploitation. Another common approach uses the so-called “softmax” operator. If \(\hat{\theta}\) represents a vector storing the estimated values for each bandit, then the probability of choosing alternative \(k\) is given by:

\[P(\textrm{choice} = k) = \frac{e^{\beta \hat{\theta}_k}}{\sum_{j=1}^{n}e^{\beta \hat{\theta}_j}}\]

where \(\beta\) is a parameter that controls how random or deterministic the choices are. As \(\beta \rightarrow 0\), the probability for each choice approches \(1/n\) (random action selection). As \(\beta \rightarrow \infty\), the probability of choosing the option with the highest value approaches 1 (deterministic action selection). Intermediate values balance exploration and exploitation.

Create a function called simulate_td_softmax that uses TD learning and the softmax action selection mechanism. It should have an additional argument beta.

Try to find a value for \(\beta\) that maximizes the agent’s performance (as before, you can just do this through trial and error, a complex search for the exact optimal value is not needed).

Update your graph from problem 4 to include data for all three approaches (TD-random, TD-epsilon, and TD-softmax).

Solution:

simulate_td_softmax <- function () {
  
  
}

Problem 7)

So far we have been using the TD-learning rule to model how the agent updates its beliefs. Given that we have been discussing Bayesian parameter estimation in class, it is natural to apply the same ideas to model learning in the bandit setting.

In particular, lets assume the agent seeks to learn the distribution \(p(\theta_k)\) for each bandit. We will use a Beta distribution as the prior, with parameters \(\alpha = \beta = 1\). Recall that this is equivalent to a uniform distribution over the interval \((0, 1)\).

After each choice, the agent receives a reward of 1 or 0. We can think of this as a coin flip experiment where the coin has an unknown bias, except now there are 10 coins (corresponding to 10 bandits) and so we need to keep track of the posterior distribution for each one. You will do this by keeping track of the count of heads and tails (reward and no-reward) for each bandit.

Create a function called simulate_bayesian_agent that implements this idea. Note: We are no longer using TD-learning. In addition, for this problem, go back to choosing actions completely at random. You might start with the function simulate_baseline_agent as your starting point.

Your function should return a data frame with 4 columns:

Run your function. Generate a plot that shows the posterior probability distributions \(p(\theta_k)\) for each bandit. Also include vertical dashed lines that show the true values for \(\theta\).

Requirements:

Solution:

simulate_bayesian_agent <- function(n_arms = 10, n_trials = 100, alpha = 0.05) {
  bandit <- 1:n_trials #return this amount of trials
  reward <- rep(0,n_trials) #reward per round
  accumulated_reward <- rep(0, n_trials) #return this- stores cumulative reward values 
  
  # Generate the true reward probability for each arm
  theta_true <- 1:n_trials
  a <- rep(alpha,n_trials)
  b <- 1:n_trials
  
  theta_true <- runif(n_arms) #return this 
  theta_est <- rep(0.5, 10) #return this

  
  for(i in 1:n_trials) {
    # Choose an action randomly- chooses the bandit randomly 
    k <- sample(1:n_arms, 1)
    
    # Generate a binary reward (0 or 1) according to the choice
    r <- as.numeric(runif(1) < theta_true[k]) # target 
    
    theta_est[k] <- theta_est[k] + alpha * (r - theta_est[k])
    theta_true[i] <- theta_est[k]
    
    reward[i] <- r
    
    if (i == 1) { #chooses value of the first bandit
      accumulated_reward[i] <- r
      
      b[i] <- r
      #beta <- r
      #theta <- pbeta(0.0, alpha, beta)

    } 
    
    else if (i != 1) { #chooses values for the rest
      accumulated_reward[i] <- accumulated_reward[i-1] + r
    }
  }
  
  sol <- data.frame(bandit, theta_true, a, b)
  
  return(sol)
  
}

reward_table <- simulate_bayesian_agent()
print(reward_table)
##     bandit theta_true    a   b
## 1        1  0.4750000 0.05   0
## 2        2  0.4750000 0.05   2
## 3        3  0.4750000 0.05   3
## 4        4  0.4750000 0.05   4
## 5        5  0.4750000 0.05   5
## 6        6  0.5250000 0.05   6
## 7        7  0.4512500 0.05   7
## 8        8  0.4512500 0.05   8
## 9        9  0.4286875 0.05   9
## 10      10  0.5012500 0.05  10
## 11      11  0.4750000 0.05  11
## 12      12  0.5487500 0.05  12
## 13      13  0.4512500 0.05  13
## 14      14  0.4750000 0.05  14
## 15      15  0.5012500 0.05  15
## 16      16  0.5250000 0.05  16
## 17      17  0.5487500 0.05  17
## 18      18  0.4786875 0.05  18
## 19      19  0.5012500 0.05  19
## 20      20  0.4761875 0.05  20
## 21      21  0.4072531 0.05  21
## 22      22  0.4761875 0.05  22
## 23      23  0.4547531 0.05  23
## 24      24  0.4523781 0.05  24
## 25      25  0.5261875 0.05  25
## 26      26  0.4797592 0.05  26
## 27      27  0.5012500 0.05  27
## 28      28  0.5498781 0.05  28
## 29      29  0.4523781 0.05  29
## 30      30  0.4750000 0.05  30
## 31      31  0.4368905 0.05  31
## 32      32  0.4557713 0.05  32
## 33      33  0.5261875 0.05  33
## 34      34  0.4829827 0.05  34
## 35      35  0.5723842 0.05  35
## 36      36  0.5088336 0.05  36
## 37      37  0.4797592 0.05  37
## 38      38  0.5213125 0.05  38
## 39      39  0.4650459 0.05  39
## 40      40  0.4998781 0.05  40
## 41      41  0.4748842 0.05  41
## 42      42  0.4786875 0.05  42
## 43      43  0.4833919 0.05  43
## 44      44  0.4547531 0.05  44
## 45      45  0.4320155 0.05  45
## 46      46  0.4104147 0.05  46
## 47      47  0.4511400 0.05  47
## 48      48  0.4592223 0.05  48
## 49      49  0.4320155 0.05  49
## 50      50  0.5713125 0.05  50
## 51      51  0.5937650 0.05  51
## 52      52  0.4512500 0.05  52
## 53      53  0.4362612 0.05  53
## 54      54  0.4286875 0.05  54
## 55      55  0.5427469 0.05  55
## 56      56  0.5156095 0.05  56
## 57      57  0.4398940 0.05  57
## 58      58  0.5057713 0.05  58
## 59      59  0.4072531 0.05  59
## 60      60  0.5640768 0.05  60
## 61      61  0.4604147 0.05  61
## 62      62  0.4898291 0.05  62
## 63      63  0.4804827 0.05  63
## 64      64  0.5358729 0.05  64
## 65      65  0.4644481 0.05  65
## 66      66  0.4653376 0.05  66
## 67      67  0.4417936 0.05  67
## 68      68  0.4564586 0.05  68
## 69      69  0.5590793 0.05  69
## 70      70  0.4912257 0.05  70
## 71      71  0.4666644 0.05  71
## 72      72  0.4368905 0.05  72
## 73      73  0.4836356 0.05  73
## 74      74  0.4952469 0.05  74
## 75      75  0.4933312 0.05  75
## 76      76  0.4686646 0.05  76
## 77      77  0.4285830 0.05  77
## 78      78  0.4650459 0.05  78
## 79      79  0.4571539 0.05  79
## 80      80  0.4594539 0.05  80
## 81      81  0.4373940 0.05  81
## 82      82  0.4864812 0.05  82
## 83      83  0.4452314 0.05  83
## 84      84  0.4342962 0.05  84
## 85      85  0.4155243 0.05  85
## 86      86  0.4704845 0.05  86
## 87      87  0.4697040 0.05  87
## 88      88  0.4469603 0.05  88
## 89      89  0.4746123 0.05  89
## 90      90  0.4420707 0.05  90
## 91      91  0.5311253 0.05  91
## 92      92  0.4125814 0.05  92
## 93      93  0.5008817 0.05  93
## 94      94  0.4621571 0.05  94
## 95      95  0.4890492 0.05  95
## 96      96  0.4758376 0.05  96
## 97      97  0.5145968 0.05  97
## 98      98  0.4229698 0.05  98
## 99      99  0.4888669 0.05  99
## 100    100  0.4447480 0.05 100

Problem 8)

Using a Bayesian inference algorithm instead of TD-learning does not avoid the problem of balancing exploration and exploitation. So far your algorithm has been selecting actions randomly.

One nice feature of Bayesian inference is that it explicitly represents uncertainty about the world. We can use this to guide exploration. A simple approach, is that on each trial, the agent generates a random sample from the posterior distribution for each bandit. It then selects the alternative that has the highest value according to these random samples.

Notice how this idea naturally balances exploration and exploitation—at the beginning of the simulation, each distribution is a uniform distribution, so its choices will be completely random. As the agent learns more about each bandit, its posterior distributions will get narrower, and so the random samples will be closer to the true values and its behavior will become more deterministic. In the machine learning literature, this approach is known as posterior sampling, or Thompson sampling. It is not necessarily the optimal solution to the exploration-exploitation tradeoff, but it often performs very well.

Modify your function simulate_bayesian_agent() to implement this idea.

In addition, modify your function so that it returns the reward and accumulated reward, in the same way that you did for problem 3.

Solution:

simulate_bayesian_agent <- function(n_arms = 10, n_trials = 100, alpha = 0.05) {
  bandit <- 1:n_trials #return this amount of trials
  reward <- rep(0,n_trials) #reward per round
  accumulated_reward <- rep(0, n_trials) #return this- stores cumulative reward values 
  
  # Generate the true reward probability for each arm
  theta_true <- 1:n_trials
  a <- rep(alpha,n_trials)
  b <- 1:n_trials
  
  theta_true <- runif(n_arms) #return this 
  theta_est <- rep(0.5, 10) #return this

  
  for(i in 1:n_trials) {
    # Choose an action randomly- chooses the bandit randomly 
    k <- sample(1:n_arms, 1)
    
    # Generate a binary reward (0 or 1) according to the choice
    r <- as.numeric(runif(1) < theta_true[k]) # target 
    
    theta_est[k] <- theta_est[k] + alpha * (r - theta_est[k])
    theta_true[i] <- theta_est[k]
    
    reward[i] <- r
    
    if (i == 1) { #chooses value of the first bandit
      accumulated_reward[i] <- r
      
      b[i] <- r
      #beta <- r
      #theta <- pbeta(0.0, alpha, beta)

    } 
    
    else if (i != 1) { #chooses values for the rest
      accumulated_reward[i] <- accumulated_reward[i-1] + r
    }
  }
  
  sol <- data.frame(bandit, theta_true, a, b)
  
  return(sol)
  
}

reward_table <- simulate_bayesian_agent()
print(reward_table)
##     bandit theta_true    a   b
## 1        1  0.4750000 0.05   0
## 2        2  0.4750000 0.05   2
## 3        3  0.4750000 0.05   3
## 4        4  0.4750000 0.05   4
## 5        5  0.5250000 0.05   5
## 6        6  0.5012500 0.05   6
## 7        7  0.5012500 0.05   7
## 8        8  0.5250000 0.05   8
## 9        9  0.5250000 0.05   9
## 10      10  0.4512500 0.05  10
## 11      11  0.4987500 0.05  11
## 12      12  0.5261875 0.05  12
## 13      13  0.4750000 0.05  13
## 14      14  0.5261875 0.05  14
## 15      15  0.4998781 0.05  15
## 16      16  0.5498781 0.05  16
## 17      17  0.5012500 0.05  17
## 18      18  0.5261875 0.05  18
## 19      19  0.4987500 0.05  19
## 20      20  0.5723842 0.05  20
## 21      21  0.4786875 0.05  21
## 22      22  0.5047531 0.05  22
## 23      23  0.4738125 0.05  23
## 24      24  0.4748842 0.05  24
## 25      25  0.5011400 0.05  25
## 26      26  0.4998781 0.05  26
## 27      27  0.5001219 0.05  27
## 28      28  0.5012500 0.05  28
## 29      29  0.4748842 0.05  29
## 30      30  0.4750000 0.05  30
## 31      31  0.5295155 0.05  31
## 32      32  0.5250000 0.05  32
## 33      33  0.4760830 0.05  33
## 34      34  0.4751158 0.05  34
## 35      35  0.4512500 0.05  35
## 36      36  0.4522789 0.05  36
## 37      37  0.4296649 0.05  37
## 38      38  0.4581817 0.05  38
## 39      39  0.5238125 0.05  39
## 40      40  0.5530397 0.05  40
## 41      41  0.5487500 0.05  41
## 42      42  0.5011400 0.05  42
## 43      43  0.4852726 0.05  43
## 44      44  0.5213125 0.05  44
## 45      45  0.4786875 0.05  45
## 46      46  0.4987500 0.05  46
## 47      47  0.5476219 0.05  47
## 48      48  0.4610090 0.05  48
## 49      49  0.5238125 0.05  49
## 50      50  0.4952469 0.05  50
## 51      51  0.4379585 0.05  51
## 52      52  0.5204845 0.05  52
## 53      53  0.5476219 0.05  53
## 54      54  0.5047531 0.05  54
## 55      55  0.4944603 0.05  55
## 56      56  0.4760830 0.05  56
## 57      57  0.5437650 0.05  57
## 58      58  0.4522789 0.05  58
## 59      59  0.4660606 0.05  59
## 60      60  0.5702408 0.05  60
## 61      61  0.4795155 0.05  61
## 62      62  0.5702408 0.05  62
## 63      63  0.4427576 0.05  63
## 64      64  0.5917287 0.05  64
## 65      65  0.5621423 0.05  65
## 66      66  0.5340352 0.05  66
## 67      67  0.4555397 0.05  67
## 68      68  0.4761875 0.05  68
## 69      69  0.4697373 0.05  69
## 70      70  0.4296649 0.05  70
## 71      71  0.4081817 0.05  71
## 72      72  0.5753877 0.05  72
## 73      73  0.5165768 0.05  73
## 74      74  0.4327627 0.05  74
## 75      75  0.5966183 0.05  75
## 76      76  0.5013600 0.05  76
## 77      77  0.5573334 0.05  77
## 78      78  0.4962504 0.05  78
## 79      79  0.4206197 0.05  79
## 80      80  0.4495887 0.05  80
## 81      81  0.4907479 0.05  81
## 82      82  0.5417287 0.05  82
## 83      83  0.4714379 0.05  83
## 84      84  0.4978660 0.05  84
## 85      85  0.4762920 0.05  85
## 86      86  0.4771093 0.05  86
## 87      87  0.4523781 0.05  87
## 88      88  0.3877726 0.05  88
## 89      89  0.5294668 0.05  89
## 90      90  0.5162105 0.05  90
## 91      91  0.5667874 0.05  91
## 92      92  0.5229727 0.05  92
## 93      93  0.5146423 0.05  93
## 94      94  0.4889102 0.05  94
## 95      95  0.5468241 0.05  95
## 96      96  0.5144647 0.05  96
## 97      97  0.5029934 0.05  97
## 98      98  0.4611246 0.05  98
## 99      99  0.4778437 0.05  99
## 100    100  0.5404000 0.05 100

Problem 9)

Generate one more plot (updating your results from problem 6) that shows the average accumulated reward for all 4 models considered: TD-random, TD-epsilon, TD-softmax, and Bayesian.

Solution:

# Your solution here

Problem 10)

Define \(\theta_1\) to be the probability that a given bandit produces a reward. Assume that \(\theta_1\) is unknown, but has a posterior probability distribution defined by a Beta distribution: \(p(\theta_1) = \mathrm{Beta}(\alpha = 7, \beta = 4)\).

Part a)

Using numerical integration, what is the probability that \(\theta_1 > 0.5\)?

integrand <- function(theta) { dbeta(theta, 7, 4) }
result <- integrate(integrand, lower=0.5, upper=1)

print(result)
## 0.828125 with absolute error < 9.2e-15
integrand <- function (x, lambda) { dexp(x, rate = lambda) }


# as beta approaches infinity, the behavior maximizes exploitation

Part b)

Using the built-in cumulative distribution function (c.d.f.), what is the probability that \(\theta_1 > 0.5\)?

alpha = 7
beta = 4

theta = pbeta(0.5, alpha, beta)
print(theta)
## [1] 0.171875

Part c)

Using Monte Carlo simulation (using 1 million samples), what is the probability that \(\theta_1 > 0.5\)?

alpha <- 7
beta <- 4

monte <- rbeta(n = 1000000, alpha, beta)

theta <- mean(monte > 0.5)

print(theta)
## [1] 0.828405

Part d)

Define \(\theta_2\) to be the probability that a different bandit produces a reward. Assume that the posterior for \(\theta_2\) is given by \(p(\theta_2) = \mathrm{Beta}(\alpha = 2, \beta = 2)\).

Using Monte Carlo simulation, what is the probability that \(\theta_1 > \theta_2\)?

alpha1 <- 7
beta1 <- 4
alpha2 <- 2
beta2 <- 2

monte1 <- rbeta(1000000, alpha1, beta1)
monte2 <- rbeta(1000000, alpha2, beta2)

compare <- mean(monte1 > monte2)

print(compare)
## [1] 0.685188

Part e)

What is the equal-tailed 95% credible interval for \(\theta_1\)?

# Your solution here