First, load the dataset into R. The dataset is stored as a tab-delimited text file. To load the data into R, run the following code:

empra <- read.table("http://nathanieldphillips.com/wp-content/uploads/2015/07/empra1.txt", header = T, sep = "\t")

The data is now stored in a dataframe object called empra to see the dataset, run the following code:

View(empra)

The dataset should have 250 entries corresponding to the 250 participants we had in the study. You can get more information about the data by running these commands:

dim(empra)
names(empra)
summary(empra)

Here are the columns in the data file:

Variable Name Description
workerid The Amazon mTurk workerid for the participant
condition.mode The mode condition of the participant. A value of peeks means the participant could either peek or keep, while keeps means the participant could only keep.
condition.difficulty The difficulty condition of the participant. A value of easy means the standard deviation of options was 10, while a value of hard means the standard deviation was 30.
condition.stability The stability condition of the participant. A value of stable means that the environment did not change over the experiment. A value of dynamic means that the best and worst options changed places in the second half of the study.
n.peeks How many times did the participant peek?
p.peeks On what proportion of the 200 trials did the participant peek?
total.points How many points did the participant earn in total?
total.points.fh, total.points.sh The number of points the participant earned in the first (trial 1 to 100) and second (trial 101 to 200) half of the study
duration How long did the study take in seconds?
sex, age Sex and age of the participant
barratt.all Impulsivity measure using the Barratt impulsiveness scale (Patton, Stanford & Barratt, 1995). Higher values indicate higher impulsivity
max.nen.all Maximization measure using a short form of the maximization scale (Nenkov et al., 2008)
reg.sch.all Regret scale from Schwartz original maximization scale (Schwartz et al., 2002)

First, let’s create plots showing the distribution of total points for each level of the independent variables.

#install.packages(beanplot)
library(beanplot)

beanplot(total.points ~ condition.mode + condition.stability + condition.difficulty, 
         data = empra, col = "white", xlab = "Action Condition", ylab = "Total Points", 
         main = "Distribution of points earned in each experimental condition")

If you’d like, you can also just create plots for levels of each independent variable separately. For example, let’s create two plots comparing all Peek conditions with all Keep conditions

library(beanplot)

beanplot(total.points ~ condition.mode, 
         data = empra, col = "white", xlab = "Action Condition", ylab = "Total Points", 
         main = "Distribution of points earned in each experimental condition")

Next, let’s create tables of summary statistics. We’ll calculate the mean number of points earned by participants as a function of each independent variable. If you want to change the summary function, you can change the code FUN = mean to FUN = sd (for standard deviation), FUN = median (for median), FUN = max (for maximum) etc.

# DV: Mean total points, IV: Mode condition
with(empra, aggregate(total.points ~ condition.mode, FUN = mean))

# DV: Mean total points, IV: Difficulty condition
with(empra, aggregate(total.points ~ condition.difficulty, FUN = mean))

# DV: Mean total points, IV: Stability Condition
with(empra, aggregate(total.points ~ condition.stability, FUN = mean))

# DV: Mean total points, IV: Mode + difficulty + stability
with(empra, 
     aggregate(total.points ~ condition.mode + condition.difficulty + condition.stability, 
               FUN = mean))

Now let’s conduct some two-sample t-tests for each of the three independent variables. For each of these tests, the DV will be the total number of points earned by a participant, and the IV is the experimental condition:

# DV: Mean total points, IV: Mode condition
with(empra, t.test(total.points ~ condition.mode))

# DV: Mean total points, IV: Difficulty condition
with(empra, t.test(total.points ~ condition.difficulty))

# DV: Mean total points, IV: Stability Condition
with(empra, t.test(total.points ~ condition.stability))

Now let’s conduct a full regression analysis

summary(lm(total.points ~ condition.mode + condition.difficulty + condition.stability, 
           data = empra))

How often do people tend to peek in general? Let’s create a histogram of the number of peeks made by participants in the peek condition

empra.peeks <- subset(empra, subset = condition.mode == "peek")

hist(empra.peeks$n.peeks, main = "Number of peeks for each participant", 
     xlab = "Peeks (out of 200 trials)")

abline(v = mean(empra.peeks$n.peeks), lty = 2)
text(mean(empra.peeks$n.peeks), 30, labels = paste("Mean = ", round(mean(empra.peeks$n.peeks))), pos = 4)

We can also create beanplots showing the distribution of peeks for each experimental condition:

beanplot(n.peeks ~ condition.difficulty + condition.stability, data = empra.peeks, 
         col = "white", ylab = "Number of peeks (out of 200 trials", 
         main = "Distribution of peeks for each experimental condition")

Was there a relationship between the number of peeks people took and their final score? To answer this, we’ll look at the data of participants in the Peeks condition, and correlate the number of peeks they took with their final score:

empra.peek <- subset(empra, subset = condition.mode == "peek")

with(empra.peek, 
     plot(n.peeks, total.points, xlab = "Number of Peeks", ylab = "Total Points"))

abline(lm(total.points ~ n.peeks, data = empra.peek))

with(empra.peek, cor.test(n.peeks, total.points))

What affected people’s peeks? Let’s see if the difficulty and stability condition affected how often people peeked:

with(empra[empra$condition.mode == "peek",], t.test(n.peeks ~ condition.difficulty))
with(empra[empra$condition.mode == "peek",], t.test(n.peeks ~ condition.stability))