This file contains the code for the simulation of a six-armed bandit with a minimum aspiration level which has to be reached at the end of a game. The goal of this simulation is to find out if the strategy to maximize the proportion of times the minimum aspiration level is reached is the same as the outcome maximization strategy. The code is not shown because, except for the distributions, it is the same as at http://rpubs.com/msteiner/mtsimoutcdeptheta.
with \(\Delta goal\) being the current state minus the minimum aspiration level.
I ran 1000 agents per \(\theta\) value. The minimum aspiration level or threshold is set to 250, \(\alpha\) is set to \(0.3\), the \(\theta\) values range from \(-5\) to \(5\) in steps of \(0.1\). Each game consists of \(30\) trials.
For each Environment (i.e. set of distributions), plot the theta values on the x axis and plot two y axes, one for the proportion of times the aspiration level was reached and one for the mean outcome reached.
To test whether low delta values actually lead to more exploration near the goal, run simulations for single agents and plot their behavior. Higher \(\delta\) values lead to more exploration when the goal is not yet reached and exploitation afterwards, than lower \(\delta\) values. By using negative \(\delta\) values, this pattern can be reversed. Note that the code for the simulations for the negative \(\delta\) values is not shown because it is the same except for the signs.
Plots for the positive \(\delta\)s.Figure 1: single agent simulations for positive delta values
single agent simulations for negative delta values
Figure 1: Proportion threshold reached and mean outcomes for different delta values. Environment 1 means are 10, 10, 9, 9, 8 and sds are 2, and an exp distr with lambda = .15.
Figure 2: Proportion threshold reached and mean outcomes for different delta values. Environment 2 means are 10, 9, 9, 6, 6 and sds are 2, and an exp distr with lambda = .15.
Figure 3: Proportion threshold reached and mean outcomes for different delta values. Environment 3 means are 9, 8, 7, 6, 5 and sds are 2, and an exp distr with lambda = .15.