Optional-homework.knit

Due date

The due date for this homework is Tuesday, February 28, by 2:00PM.

Instructions

This homework is completely optional. If you submit something, it will be used to replace your lowest HW grade. If your grade on this homework ends up being your lowest homework grade, then it will not be counted against you.

This homework builds on the content in Exam #1. In particular, it will likely be necessary for you to complete up to problem 6 in Exam #1 in order to complete this homework.

For this exam you must provide all of your answers in a single file, either straight source code (.R), or an R markdown file (.Rmd), or a ‘knit’ markdown document. If I run your source code (or knit your markdown file), it should run from beginning to end without producing any errors. Your code should conform to the tidyverse R programming style guide, available here for reference:
https://style.tidyverse.org/index.html.

Partial credit will be given, so if you are unsure of a solution or can’t get your code to work, you should include concise comments in your code that explain your thought process/approach.

Introduction & Background

This homework examines actual human behavior in a multi-armed bandit task. The data comes from the following journal article:

Stojić, H., Orquin, J. L., Dayan, P., Dolan, R. J., & Speekenbrink, M. (2020). Uncertainty in learning, choice, and visual fixation. Proceedings of the National Academy of Sciences, 117(6), 3291-3300. https://www.pnas.org/doi/full/10.1073/pnas.1911348117

In this paper, the authors examine human learning performance in a 6-armed bandit. The authors sought to study human learning and decision-making under uncertainty. In addition, they also collected eye gaze data (where people were looking on each trial) to examine how visual fixations predict or interact with decision-making. However, for this homework we will focus on the basic behavioral data and ignore the eye data.

The basic experiment was a 6-armed bandit. Participants made 60 choices, and were paid in proportion to the rewards they received. In this experiment, the rewards were random values sampled from a normal distribution (unlike the binary rewards considered in the exam). In other words, each arm had a mean payout, as well as a standard deviation. Participants were not told the mean value for each arm, but had to learn for themselves (balancing exploration and exploitation). The values for each arm were randomized for each participant.

The file human_data.csv contains 25 columns. Below is a concise description of each column:

subjectID : A random numeric identifier assigned to each participant
age, gender : The age and gender of each participant
noArms : Has the constant value 6
valExp1 … valExp6 : The expected (mean) value for arms 1 … 6
valSDArm1 … valSDArm6 : The standard deviation in value for arms 1 … 6
trial : The trial number (1 … 60)
chosenArm : Which arm was chosen on that trial (1 … 6)
choiceRT : The response time on each trial (not relevant here)
reward : The actual reward received following each choice
rewardExp : The expected (mean) reward for the chosen arm
correct : 1 or 0, whether the participant chose the optimal arm
correctArm : Which arm was the optimal choice
chosenRankExp : The rank of the chosen arm, in terms of expected value, from best (1) through worst (6)
rewardTotal : The total reward received by the participant

Note, one complication is that some trials are missing (for example, the participant simply didn’t respond on a particular trial within a fixed time limit). So don’t assume in your code that there are exactly 60 trials for each participant.

Problem 1)

Load in the file human_data.csv.

Generate a plot of the average rank of the choice made by participants on each trial of the experiment. In other words, your plot should have trial number on the x-axis, and the mean choice rank on the y-axis. (In this graph, lower values indicate better choices.)

– asking for average choice per trial? Not per participant?

Hint: Take a look at the tidyverse code given in Problem 4 from Exam #1.

Solution:

library(tidyverse)

hdata <- read.csv('human_data.csv')

trial_choice <- hdata %>%
  group_by(trial) %>%
  summarise(avg_rank_trial = mean(chosenRankExp))



ggplot(trial_choice, aes(x = trial, y = avg_rank_trial)) +
  geom_line() +
  geom_point() +
  labs(x = "Trial Number", y = "Mean Choice Rank")

Problem 2)

Create a vector that stores all the unique subject identifiers in the dataset.

Hints:

Your vector should contain 32 values, because there are 32 unique subjects
You can do this with a single short line of code

Solution:

subjects <- avg_results$subjectID
print(subjects)

Problem 3)

Generate a plot that shows the total accumulated reward for each subject on each trial of the experiment. That is, it should be a line graph with 32 lines. The data from each subject should be plotted using a different color.

COMMENT: I think this is asking for the reward per trial to be graphed rather than the sum Hint: You will probably need to convert the subjectID column to a factor to produce the correct output.

Solution:

hdata <- read.csv('human_data.csv')

sum_results <- hdata %>% 
  group_by(subjectID) %>%
  summarise(total_reward = sum(reward))

ggplot(sum_results, aes(x = subjectID, y = total_reward)) +
  geom_line() +
  geom_point() +
  labs(x = "Subject ID", y = "Total Accumulated Reward")

Problem 4)

Write a function called select_subject that takes two arguments: a dataframe containing the entire dataset, and a subject identifier, and returns just the data from the specified subject. COMMENT: is this talking about the subjectID that each participant has? Unsure

Solution:

select_subject <- function(data, subject_id) {
  subset(data, subjectID == subject_id)
}

print(select_subject(hdata, 1274711)) # should return first participant's data (only differs in trial num and everything after)

Problem 5)

This problem is perhaps the hardest one. For this problem, you will write a function called subject_log_likelihood that implements a reinforcement learning model, using TD-learning and the softmax decision rule (see problems 1–6 from Exam #1). This function should return the log-likelihood of a given dataset according to the model.

Here is how to think about this: On each trial, your RL model computes the probability of choosing each of the 6 arms. Let’s say for example that the computed probabilities on trial \(k\) are \(P_k = \lbrace 0.1, 0.2, 0.1, 0.3, 0.2, 0.1 \rbrace\). Let’s say that on this trial, the human participant chose the second arm. The likelihood of this choice, according to your model, is 0.2. The log-likelihood of the choice is simply \(\mathrm{log}(0.2)\).

The human then receives a reward. Your RL model should update its value estimates for arm 2, using the actual reward received by the participant on that trial. The log-likelihood of the entire dataset (60 trials), is simply the sum of the log-likelihood of each individual choice.

The function you write should take the following arguments:

subject_data : A dataframe containing the data from a single subject COMMENT:I’m guessing this is one participant? Im unsure
alpha : The learning rate parameter for TD-learning
beta : The noise parameter for the softmax action selection equation

Solution:

subject_log_likelihood <- function(subject_data, alpha, beta) {
  q_values <- rep(0, 6) # each arm is 0 = {0,0,0,0,0,0}
  
  log_likelihood <- 0
  
  for (i in 1:nrow(subject_data)) { # going through each trial for one participant- using the 'select_subject' function
    
    # softmax equation
    arm_probs <- exp(beta*q_values) / sum(exp(beta*q_values))
    
    arm_choice <- subject_data[i, "choice"]
    
    prob_choice <- arm_probs[arm_choice]
    log_likelihood <- log_likelihood + log(prob_choice)
    
    r <- subject_data[i, "reward"]
    prediction_error <- r - q_values[arm_choice]
    q_values[arm_choice] <- q_values[arm_choice] + alpha*prediction_error
    
  }
  
  return(log_likelihood)
  
}

Problem 6)

Write code that will find the maximum likelihood estimate (MLE) for the parameters \(\alpha\) (learning rate) and \(\beta\) (softmax parameter) for a single subject. You will need to do this using numerical optimization, and your objective function will have to call the function you wrote in problem 5 to compute the log-likelihood.

Solution:

Problem 7)

Find the best-fitting parameters for each subject in the dataset. Because numerical optimization is often brittle, you may need to play around with different settings, for example, trying out several initial values for your parameter search, trying different optimization algorithms, and/or adding bounds on the parameter ranges. (Note, I have not done this problem yet myself, but speaking from experience I expect it will be fidgety.)

Generate a plot that shows each subject as a point in a 2D plane, with the x-coordinate determined by their estimated \(\alpha\) parameter, and the y-coordinate given by their \(\beta\) parameter.

(If you can’t get your code for problem 6 to work, make a note of this, and you can substitute in random values for the parameter estimates.)

Math Methods in Psychological Science:
Optional / extra credit homework

Spring 2023

Due date

Instructions

Introduction & Background

Problem 1)

Solution:

Problem 2)

Solution:

Problem 3)

Solution:

Problem 4)

Solution:

Problem 5)

Solution:

Problem 6)

Solution:

Problem 7)

Solution:

Math Methods in Psychological Science: Optional / extra credit homework

Spring 2023

Due date

Instructions

Introduction & Background

Problem 1)

Solution:

Problem 2)

Solution:

Problem 3)

Solution:

Problem 4)

Solution:

Problem 5)

Solution:

Problem 6)

Solution:

Problem 7)

Solution:

Math Methods in Psychological Science:
Optional / extra credit homework