Simulation with an Outcome Dependent Theta, Four-Armed Bandit

This file contains the code for the simulation of a four-armed bandit with a minimum aspiration level which has to be reached at the end of a game. The goal of this simulation is to find out if the strategy to maximize the proportion of times the minimum aspiration level is reached is the same as the outcome maximization strategy. The code is not shown because, except for the distributions, it is the same as at http://rpubs.com/msteiner/mtsimoutcdeptheta.

Define functions

First functions are defined to run the simulations. We use a Rescorla-Wagner prediction error updating function with an extended softmax function. The softmax function in this case is \[\begin{equation} p_{i}=\frac{e^{x_{i} \theta}}{\sum_{i=1}^{n}e^{x_{i} \theta}} \end{equation}\] but the \(\theta\) values are not fixed but a function of how far the current state is from the minimum aspiration level, i.e. \[\begin{equation} \theta = \frac{1}{1 + e^{- \frac{\Delta goal}{goal} * \delta}} \end{equation}\]

with \(\Delta goal\) being the current state minus the minimum aspiration level.

Run the simulation

I ran 1000 agents per \(\theta\) value. The minimum aspiration level or threshold is set to 250, \(\alpha\) is set to \(0.3\), the \(\theta\) values range from \(-5\) to \(5\) in steps of \(0.1\). Each game consists of \(30\) trials.

Plots

For each Environment (i.e. set of distributions), plot the theta values on the x axis and plot two y axes, one for the proportion of times the aspiration level was reached and one for the mean outcome reached.

To test whether low delta values actually lead to more exploration near the goal, run simulations for single agents and plot their behavior. Higher \(\delta\) values lead to more exploration when the goal is not yet reached and exploitation afterwards, than lower \(\delta\) values. By using negative \(\delta\) values, this pattern can be reversed. Note that the code for the simulations for the negative \(\delta\) values is not shown because it is the same except for the signs.

Plots for the positive \(\delta\)s.

Figure 1: single agent simulations for positive delta values

Plots for the negative \(\delta\)s.

single agent simulations for negative delta values

Figure 1: Proportion threshold reached and mean outcomes for different delta values. Environment 1 means are 10, 9, 8 and sds are 2, and an exp distr with lambda = .15.

Figure 2: Proportion threshold reached and mean outcomes for different delta values. Environment 2 means are 9, 8, 7 and sds are 2, and an exp distr with lambda = .15.

Figure 3: Proportion threshold reached and mean outcomes for different delta values. Environment 3 means are 9, 7, 6 and sds are 2, and an exp distr with lambda = .15.

Simulation with an Outcome Dependent Theta, Four-Armed Bandit

Markus Steiner

3 Dezember 2016

Define functions

Run the simulation

Plots