Due date

The due date for this homework is Tuesday, February 28, by 2:00PM.

Instructions

This homework is completely optional. If you submit something, it will be used to replace your lowest HW grade. If your grade on this homework ends up being your lowest homework grade, then it will not be counted against you.

This homework builds on the content in Exam #1. In particular, it will likely be necessary for you to complete up to problem 6 in Exam #1 in order to complete this homework.

For this exam you must provide all of your answers in a single file, either straight source code (.R), or an R markdown file (.Rmd), or a ‘knit’ markdown document. If I run your source code (or knit your markdown file), it should run from beginning to end without producing any errors. Your code should conform to the tidyverse R programming style guide, available here for reference:
https://style.tidyverse.org/index.html.

Partial credit will be given, so if you are unsure of a solution or can’t get your code to work, you should include concise comments in your code that explain your thought process/approach.

Introduction & Background

This homework examines actual human behavior in a multi-armed bandit task. The data comes from the following journal article:

In this paper, the authors examine human learning performance in a 6-armed bandit. The authors sought to study human learning and decision-making under uncertainty. In addition, they also collected eye gaze data (where people were looking on each trial) to examine how visual fixations predict or interact with decision-making. However, for this homework we will focus on the basic behavioral data and ignore the eye data.

The basic experiment was a 6-armed bandit. Participants made 60 choices, and were paid in proportion to the rewards they received. In this experiment, the rewards were random values sampled from a normal distribution (unlike the binary rewards considered in the exam). In other words, each arm had a mean payout, as well as a standard deviation. Participants were not told the mean value for each arm, but had to learn for themselves (balancing exploration and exploitation). The values for each arm were randomized for each participant.

The file human_data.csv contains 25 columns. Below is a concise description of each column:

Note, one complication is that some trials are missing (for example, the participant simply didn’t respond on a particular trial within a fixed time limit). So don’t assume in your code that there are exactly 60 trials for each participant.

Problem 1)

Load in the file human_data.csv.

Generate a plot of the average rank of the choice made by participants on each trial of the experiment. In other words, your plot should have trial number on the x-axis, and the mean choice rank on the y-axis. (In this graph, lower values indicate better choices.)

– asking for average choice per trial? Not per participant?

Hint: Take a look at the tidyverse code given in Problem 4 from Exam #1.

Solution:

library(tidyverse)

hdata <- read.csv('human_data.csv')

trial_choice <- hdata %>%
  group_by(trial) %>%
  summarise(avg_rank_trial = mean(chosenRankExp))



ggplot(trial_choice, aes(x = trial, y = avg_rank_trial)) +
  geom_line() +
  geom_point() +
  labs(x = "Trial Number", y = "Mean Choice Rank")

Problem 2)

Create a vector that stores all the unique subject identifiers in the dataset.

Hints:

Solution:

subjects <- avg_results$subjectID
print(subjects)

Problem 3)

Generate a plot that shows the total accumulated reward for each subject on each trial of the experiment. That is, it should be a line graph with 32 lines. The data from each subject should be plotted using a different color.

COMMENT: I think this is asking for the reward per trial to be graphed rather than the sum Hint: You will probably need to convert the subjectID column to a factor to produce the correct output.

Solution:

hdata <- read.csv('human_data.csv')

sum_results <- hdata %>% 
  group_by(subjectID) %>%
  summarise(total_reward = sum(reward))

ggplot(sum_results, aes(x = subjectID, y = total_reward)) +
  geom_line() +
  geom_point() +
  labs(x = "Subject ID", y = "Total Accumulated Reward")

Problem 4)

Write a function called select_subject that takes two arguments: a dataframe containing the entire dataset, and a subject identifier, and returns just the data from the specified subject. COMMENT: is this talking about the subjectID that each participant has? Unsure

Solution:

select_subject <- function(data, subject_id) {
  subset(data, subjectID == subject_id)
}

print(select_subject(hdata, 1274711)) # should return first participant's data (only differs in trial num and everything after)

Problem 5)

This problem is perhaps the hardest one. For this problem, you will write a function called subject_log_likelihood that implements a reinforcement learning model, using TD-learning and the softmax decision rule (see problems 1–6 from Exam #1). This function should return the log-likelihood of a given dataset according to the model.

Here is how to think about this: On each trial, your RL model computes the probability of choosing each of the 6 arms. Let’s say for example that the computed probabilities on trial \(k\) are \(P_k = \lbrace 0.1, 0.2, 0.1, 0.3, 0.2, 0.1 \rbrace\). Let’s say that on this trial, the human participant chose the second arm. The likelihood of this choice, according to your model, is 0.2. The log-likelihood of the choice is simply \(\mathrm{log}(0.2)\).

The human then receives a reward. Your RL model should update its value estimates for arm 2, using the actual reward received by the participant on that trial. The log-likelihood of the entire dataset (60 trials), is simply the sum of the log-likelihood of each individual choice.

The function you write should take the following arguments:

Solution:

subject_log_likelihood <- function(subject_data, alpha, beta) {
  q_values <- rep(0, 6) # each arm is 0 = {0,0,0,0,0,0}
  
  log_likelihood <- 0
  
  for (i in 1:nrow(subject_data)) { # going through each trial for one participant- using the 'select_subject' function
    
    # softmax equation
    arm_probs <- exp(beta*q_values) / sum(exp(beta*q_values))
    
    arm_choice <- subject_data[i, "choice"]
    
    prob_choice <- arm_probs[arm_choice]
    log_likelihood <- log_likelihood + log(prob_choice)
    
    r <- subject_data[i, "reward"]
    prediction_error <- r - q_values[arm_choice]
    q_values[arm_choice] <- q_values[arm_choice] + alpha*prediction_error
    
  }
  
  return(log_likelihood)
  
}

Problem 6)

Write code that will find the maximum likelihood estimate (MLE) for the parameters \(\alpha\) (learning rate) and \(\beta\) (softmax parameter) for a single subject. You will need to do this using numerical optimization, and your objective function will have to call the function you wrote in problem 5 to compute the log-likelihood.

Solution:

Problem 7)

Find the best-fitting parameters for each subject in the dataset. Because numerical optimization is often brittle, you may need to play around with different settings, for example, trying out several initial values for your parameter search, trying different optimization algorithms, and/or adding bounds on the parameter ranges. (Note, I have not done this problem yet myself, but speaking from experience I expect it will be fidgety.)

Generate a plot that shows each subject as a point in a 2D plane, with the x-coordinate determined by their estimated \(\alpha\) parameter, and the y-coordinate given by their \(\beta\) parameter.

(If you can’t get your code for problem 6 to work, make a note of this, and you can substitute in random values for the parameter estimates.)

Solution: