For this exercise, please try to reproduce the results from Experiment 1 of the associated paper (Farooqui & Manly, 2015). The PDF of the paper is included in the same folder as this Rmd file.

Methods summary:

Participants (N=21) completed a series of trials that required them to switch or stay from one task to the other. One task was to choose the larger value of the two values if surrounded by a green box. The other task was to choose the value with the larger font if surrounded by a blue box. Subliminal cues followed by a mask were presented before each trial. Cues included “O” (non-predictive cue), “M” (switch predictive cue), and “T” (repeat predictive cue). Reaction times and performance accuracy were measured.

Target outcomes:

Below is the specific result you will attempt to reproduce (quoted directly from the results section of Experiment 1):

Performance on switch trials, relative to repeat trials, incurred a switch cost that was evident in longer RTs (836 vs. 689 ms) and lower accuracy rates (79% vs. 92%). If participants were able to learn the predictive value of the cue that preceded only switch trials and could instantiate relevant anticipatory control in response to it, the performance on switch trials preceded by this cue would be better than on switch trials preceded by the nonpredictive cue. This was indeed the case (mean RT-predictive cue: 819 ms; nonpredictive cue: 871 ms; mean difference = 52 ms, 95% confidence interval, or CI = [19.5, 84.4]), two-tailed paired t(20) = 3.34, p < .01. However, error rates did not differ across these two groups of switch trials (predictive cue: 78.9%; nonpredictive cue: 78.8%), p = .8.

Step 1: Load packages

library(tidyverse) # for data munging
library(knitr) # for kable table formating
library(haven) # import and export 'SPSS', 'Stata' and 'SAS' Files
library(readxl) # import excel files

# #optional packages:
# library(broom)

Step 2: Load data

# This reads all the participants data (each is in a seperate xls file) in and combines them into one dataframe
# Each xls has 250 rows, the rest is their calculations using excel, which we don't want in the data
files <- dir('data/Experiment 1')

data <- data.frame()
id <- 1
for (file in files){
  if(file != 'Codebook.xls'){
    temp_data <- read_xls(file.path('data/Experiment 1', file))
    temp_data$id <- id
    id <- id + 1
    temp_data <- temp_data[1:250, ]
    data <- rbind(data, temp_data)
  }
}

Step 3: Tidy data

Each row is an observation. The data is already in tidy format.

Step 4: Run analysis

Pre-processing

d_cleaned <- data |> 
  filter(
    TrialType %in% c(1, 2)
    ) |> 
  mutate(trial_name = case_when(
    TrialType == 1 ~ "rept",
    TrialType == 2 ~ "switch"
  ))

Descriptive statistics

Performance on switch trials, relative to repeat trials, incurred a switch cost that was evident in longer RTs (836 vs. 689 ms)

mean_RTs <- d_cleaned |> 
  filter(RespCorr) |> 
  group_by(id, trial_name) |> 
  summarise(med_RT = median(RT)) |> 
  ungroup() |> 
  pivot_wider(names_from = trial_name, values_from = med_RT) |> 
  summarize(mean_repeated = mean(rept), mean_switch = mean(switch))

cat("Performance on switch trials, relative to repeat trials, incurred a switch cost that was evident in longer RTs (", round(mean_RTs$mean_switch)," vs. ", round(mean_RTs$mean_repeated)," ms).", sep = '')

## Performance on switch trials, relative to repeat trials, incurred a switch cost that was evident in longer RTs (836 vs. 662 ms).

Performance on switch trials, relative to repeat trials, incurred a switch cost that was evident in […] lower accuracy rates (79% vs. 92%)

# reproduce the above results here
acurracy_d <- d_cleaned |> 
  group_by(id, trial_name) |> 
  summarise(accuracy = mean(RespCorr)*100) |> 
  ungroup() |> 
  pivot_wider(names_from = trial_name, values_from = accuracy) |> 
  summarize(rept_acc = mean(rept), switch_acc = mean(switch)) |> 
  round()

cat("Performance on switch trials, relative to repeat trials, incurred a switch cost that was evident in [...] lower accuracy rates (", acurracy_d$switch_acc,"% vs. ", acurracy_d$rept_acc, "%).", sep = '')

## Performance on switch trials, relative to repeat trials, incurred a switch cost that was evident in [...] lower accuracy rates (79% vs. 91%).

Now you will analyze Predictive Switch Cues vs Non-predictive Switch Cues. Let’s start with reaction time.

This was indeed the case (mean RT-predictive cue: 819 ms; nonpredictive cue: 871 ms; … )

# reproduce the above results here

# Couldn't figure out how to reproduce this result

# d_cleaned |> 
#   filter(trial_name == 'switch') |> 
#   mutate(predictive_switch = if_else(Prime %in% c('M'), TRUE, FALSE)) |> 
#   group_by(predictive_switch) |>
#   summarize(avg_pred_RT = mean(RT)) |> 
#   round()
# 
# d_cleaned |> 
#   filter(trial_name == 'switch') |> 
#   filter(RespCorr) |> 
#   mutate(predictive_switch = if_else(Prime %in% c('M'), TRUE, FALSE)) |> 
#   group_by(predictive_switch) |>
#   summarize(avg_pred_RT = mean(RT)) |> 
#   round()
# 
# d_cleaned |> 
#   filter(trial_name == 'switch') |> 
#   mutate(predictive_switch = if_else(Prime %in% c('M', '2'), TRUE, FALSE)) |> 
#   group_by(predictive_switch) |>
#   summarize(avg_pred_RT = mean(RT)) |> 
#   round()
# 
# d_cleaned |> 
#   filter(trial_name == 'switch') |> 
#   mutate(predictive_switch = if_else(Prime %in% c('M', '8'), TRUE, FALSE)) |> 
#   group_by(predictive_switch) |>
#   summarize(avg_pred_RT = mean(RT)) |> 
#   round()

# d_cleaned |> 
#   filter(trial_name == 'switch') |> 
#   # filter(RespCorr) |>   
#   mutate(predictive_switch = if_else(Prime %in% c('M', '8'), TRUE, FALSE)) |> 
#   group_by(id, predictive_switch) |>
#   summarise(mean_RT = mean(RT, na.rm = T)) |>
#   ungroup() |>
#   pivot_wider(names_from = predictive_switch, values_from = mean_RT, names_prefix = 'pred_') |>
#   summarize(avg_pred_RT = mean(pred_TRUE, na.rm = T),
#             avg_non_pred_RT = mean(pred_FALSE, na.rm = T)) |>
#   round()

# d_cleaned |> 
#   filter(trial_name == 'switch') |> 
#   mutate(predictive_switch = if_else(Prime %in% c('M', '2'), TRUE, FALSE)) |> 
#   group_by(id, predictive_switch) |>
#   summarise(med_RT = median(RT, na.rm = T)) |>
#   ungroup() |>
#   pivot_wider(names_from = predictive_switch, values_from = med_RT, names_prefix = 'pred_') |>
#   summarize(avg_pred_RT = mean(pred_TRUE),
#             avg_non_pred_RT = mean(pred_FALSE)) |>
#   round()

avg_RT_df <- d_cleaned |> 
  filter(trial_name == 'switch') |> 
  mutate(predictive_switch = if_else(Prime %in% c('M', '8'), TRUE, FALSE)) |> 
  group_by(id, predictive_switch) |>
  summarise(med_RT = median(RT, na.rm = T)) |>
  ungroup() |>
  pivot_wider(names_from = predictive_switch, values_from = med_RT, names_prefix = 'pred_')

avg_RT_df |> 
  summarize(avg_pred_RT = mean(pred_TRUE),
            avg_non_pred_RT = mean(pred_FALSE)) |>
  round()

## # A tibble: 1 × 2
##   avg_pred_RT avg_non_pred_RT
##         <dbl>           <dbl>
## 1         803             836

Next you will try to reproduce error rates for Switch Predictive Cues vs Switch Non-predictive Cues.

However, error rates did not differ across these two groups of switch trials (predictive cue: 78.9%; nonpredictive cue: 78.8%)

# reproduce the above results here

# Couldn't figure out how to reproduce this result

# d_cleaned |> 
#   filter(trial_name == 'switch') |> 
#   mutate(predictive_switch = if_else(Prime %in% c('M', '8'), TRUE, FALSE)) |>  
#   group_by(predictive_switch) |>
#   summarise(acc = mean(RespCorr)*100) |> 
#   round(1)


avg_acc_df <- d_cleaned |> 
  filter(trial_name == 'switch') |> 
  mutate(predictive_switch = if_else(Prime %in% c('M', '8'), TRUE, FALSE)) |> 
  group_by(id, predictive_switch) |>
  summarise(acc = mean(RespCorr)*100) |>
  ungroup() |>
  pivot_wider(names_from = predictive_switch, values_from = acc, names_prefix = 'pred_') 

avg_acc_df |>
  summarize(avg_pred_RT = mean(pred_TRUE),
            avg_non_pred_RT = mean(pred_FALSE)) |>
  round(1)

## # A tibble: 1 × 2
##   avg_pred_RT avg_non_pred_RT
##         <dbl>           <dbl>
## 1        79.9              78

Inferential statistics

The first claim is that in switch trials, predictive cues lead to statistically significant faster reaction times than nonpredictive cues.

… the performance on switch trials preceded by this cue would be better than on switch trials preceded by the nonpredictive cue. This was indeed the case (mean RT-predictive cue: 819 ms; nonpredictive cue: 871 ms; mean difference = 52 ms, 95% confidence interval, or CI = [19.5, 84.4]), two-tailed paired t(20) = 3.34, p < .01.

# reproduce the above results here

# Couldn't figure out how to reproduce this result

t_res <- t.test(avg_RT_df$pred_FALSE, avg_RT_df$pred_TRUE, paired = T)
t_res

## 
##  Paired t-test
## 
## data:  avg_RT_df$pred_FALSE and avg_RT_df$pred_TRUE
## t = 1.9448, df = 20, p-value = 0.06599
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -2.393367 68.358397
## sample estimates:
## mean difference 
##        32.98251

cat("mean difference = ", round(t_res$estimate)," ms, 95% confidence interval, or CI = [", round(t_res$conf.int[1], 1),", ", round(t_res$conf.int[2], 1),"]), two-tailed paired t(", round(t_res$parameter),") = ", round(t_res$statistic, 2),", p = ", round(t_res$p.value, 3),".", sep = '')

## mean difference = 33 ms, 95% confidence interval, or CI = [-2.4, 68.4]), two-tailed paired t(20) = 1.94, p = 0.066.

Next, test the second claim.

However, error rates did not differ across these two groups of switch trials (predictive cue: 78.9%; nonpredictive cue: 78.8%), p = .8.

# reproduce the above results here

# Couldn't figure out how to reproduce this result
acc_t_res <- t.test(avg_acc_df$pred_FALSE, avg_acc_df$pred_TRUE, paired = T)
acc_t_res

## 
##  Paired t-test
## 
## data:  avg_acc_df$pred_FALSE and avg_acc_df$pred_TRUE
## t = -1.0766, df = 20, p-value = 0.2945
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -5.623939  1.794888
## sample estimates:
## mean difference 
##       -1.914526

cat("mean difference = ", round(acc_t_res$estimate, 1)," ms, 95% confidence interval, or CI = [", round(acc_t_res$conf.int[1], 1),", ", round(acc_t_res$conf.int[2], 1),"]), two-tailed paired t(", round(acc_t_res$parameter),") = ", round(acc_t_res$statistic, 2),", p = ", round(acc_t_res$p.value, 3),".", sep = '')

## mean difference = -1.9 ms, 95% confidence interval, or CI = [-5.6, 1.8]), two-tailed paired t(20) = -1.08, p = 0.294.

Step 5: Reflection

Were you able to reproduce the results you attempted to reproduce? If not, what part(s) were you unable to reproduce?

I was only able to reproduce some of the results in the 3 hours allotted for this assignment. I calculated the correct value for average switch trial RT but not for average repeat trial RT. The accuracy for switch trials and repeat trials was also mostly correct, though the result for repeat trials differed by 1 after rounding. I couldn’t accurately reproduce any of the other results after this point.

How difficult was it to reproduce your results?

It was very difficult to reproduce this results. I tried many different strategies and approaches but couldn’t seem to arrive at the right numbers.

What aspects made it difficult? What aspects made it easy?

It was helpful that the data was already tidy.

There were several factors that made reproduction hard.

Some of variable were poorly named. For example, RespCorr and CorrResp sound like they both indicate whether a response is correct, but these variable aren’t the same.

Some catagorical variables had unclear categories. For example, Prime had both numerical (2, 4, 8) and letter values (M, O, T). I assume that 2, 4, and 8 are alternative labels for M, O, and T, but this isn’t stated.

The paper doesn’t provide clear descriptions of what calculations they performed. For example, they said that “unless stated otherwise, the statistical tests were performed on the more stable median values rather than mean values.” Later, they calculate mean RTs. So did they calculate median RT for each subject then the mean of these values, did they use means for all parts of the calculation, or did they do something else?

Reproducibility Report: Group B Choice 1