For this exercise, please try to reproduce the results from Experiment 1 of the associated paper (Farooqui & Manly, 2015). The PDF of the paper is included in the same folder as this Rmd file.

Methods summary:

Participants (N=21) completed a series of trials that required them to switch or stay from one task to the other. One task was to choose the larger value of the two values if surrounded by a green box. The other task was to choose the value with the larger font if surrounded by a blue box. Subliminal cues followed by a mask were presented before each trial. Cues included “O” (non-predictive cue), “M” (switch predictive cue), and “T” (repeat predictive cue). Reaction times and performance accuracy were measured.

Target outcomes:

Below is the specific result you will attempt to reproduce (quoted directly from the results section of Experiment 1):

Performance on switch trials, relative to repeat trials, incurred a switch cost that was evident in longer RTs (836 vs. 689 ms) and lower accuracy rates (79% vs. 92%). If participants were able to learn the predictive value of the cue that preceded only switch trials and could instantiate relevant anticipatory control in response to it, the performance on switch trials preceded by this cue would be better than on switch trials preceded by the nonpredictive cue. This was indeed the case (mean RT-predictive cue: 819 ms; nonpredictive cue: 871 ms; mean difference = 52 ms, 95% confidence interval, or CI = [19.5, 84.4]), two-tailed paired t(20) = 3.34, p < .01. However, error rates did not differ across these two groups of switch trials (predictive cue: 78.9%; nonpredictive cue: 78.8%), p = .8.

Step 1: Load packages

library(tidyverse) # for data munging
library(knitr) # for kable table formating
library(haven) # import and export 'SPSS', 'Stata' and 'SAS' Files
library(readxl) # import excel files

# #optional packages:
# library(broom)

Step 2: Load data

# This reads all the participants data (each is in a seperate xls file) in and combines them into one dataframe
# Each xls has 250 rows, the rest is their calculations using excel, which we don't want in the data
files <- dir('data/Experiment 1')

data <- data.frame()
id <- 1
for (file in files){
  if(file != 'Codebook.xls'){
    temp_data <- read_xls(file.path('data/Experiment 1', file))
    temp_data$id <- id
    id <- id + 1
    temp_data <- temp_data[1:250, ]
    data <- rbind(data, temp_data)
  }
}

Step 3: Tidy data

Each row is an observation. The data is already in tidy format.

Step 4: Run analysis

Pre-processing

names(data)

##  [1] "Block_Number" "Event_Number" "Prime"        "PrimeVisible" "TaskType"    
##  [6] "TrialType"    "CorrResp"     "RT"           "RespCorr"     "lnum"        
## [11] "rnum"         "lFont"        "swt"          "stay"         "stay_2...15" 
## [16] "stay_4...16"  "swt_2...17"   "swt_8...18"   "swt_2...19"   "swt_8...20"  
## [21] "stay_2...21"  "stay_4...22"  "id"

view(data)

data %>% as.tibble() %>% count(TrialType)

## # A tibble: 3 x 2
##   TrialType     n
##       <dbl> <int>
## 1         0    42
## 2         1  3987
## 3         2  1221

data %>% as.tibble() %>% count(TaskType)

## # A tibble: 2 x 2
##   TaskType     n
##      <dbl> <int>
## 1        1  2568
## 2        2  2682

data_RT = data %>% 
  select(c("id", "TaskType", "TrialType", "RT", "CorrResp", "Prime")) %>%
  filter(!TrialType == 0)

data_RT %>% as.tibble() %>% count(TrialType)

## # A tibble: 2 x 2
##   TrialType     n
##       <dbl> <int>
## 1         1  3987
## 2         2  1221

view(data_RT)

Descriptive statistics

Performance on switch trials, relative to repeat trials, incurred a switch cost that was evident in longer RTs (836 vs. 689 ms)

responsetime_averages = data_RT %>% 
  group_by(TrialType) %>%
  summarize(MedianRT = median(RT))

responsetime_averages = data_RT %>% 
  group_by(TrialType, id) %>%
  summarize(MedianRT = median(RT))

trial1_rt = responsetime_averages %>% filter(TrialType == 1)
median_trial1 = median(trial1_rt$MedianRT)
print("Repeat trials = ")

## [1] "Repeat trials = "

median_trial1

## [1] 673

trial2_rt = responsetime_averages %>% filter(TrialType == 2)
median_trial2 = median(trial2_rt$MedianRT)
print("Switch trials = ")

## [1] "Switch trials = "

median_trial2

## [1] 772.75

responsetime_averages

## # A tibble: 42 x 3
## # Groups:   TrialType [2]
##    TrialType    id MedianRT
##        <dbl> <dbl>    <dbl>
##  1         1     1     537.
##  2         1     2     633.
##  3         1     3     813.
##  4         1     4     613.
##  5         1     5     808.
##  6         1     6     741.
##  7         1     7     673 
##  8         1     8     681.
##  9         1     9     957.
## 10         1    10     701.
## # … with 32 more rows

Performance on switch trials, relative to repeat trials, incurred a switch cost that was evident in […] lower accuracy rates (79% vs. 92%)

correct_percentage = data_RT %>% 
  group_by(TrialType, id) %>%
  summarize(TotalCorrect = sum(CorrResp))

trial1_corr = correct_percentage %>% filter(TrialType == 1)
median_trial1_corr = median(trial1_corr$TotalCorrect)
median_trial1_corr

## [1] 92

trial2_corr = correct_percentage %>% filter(TrialType == 2)
median_trial2_corr = median(trial2_corr$TotalCorrect)
median_trial2_corr

## [1] 28

Now you will analyze Predictive Switch Cues vs Non-predictive Switch Cues. Let’s start with reaction time.

This was indeed the case (mean RT-predictive cue: 819 ms; nonpredictive cue: 871 ms; … )

responsetime_prime_averages = data_RT %>% 
  group_by(Prime) %>%
  summarize(MedianRT_prime = median(RT))

prime_nonpred_rt = responsetime_prime_averages %>% filter(Prime == "O")
mean_nonpred_rt = mean(prime_nonpred_rt$MedianRT_prime)
mean_nonpred_rt

## [1] 723.25

prime_pred_rt = responsetime_prime_averages %>% filter(Prime == "M" | Prime == "T")
mean_pred_rt = mean(prime_pred_rt$MedianRT_prime)
mean_pred_rt

## [1] 758.1719

Next you will try to reproduce error rates for Switch Predictive Cues vs Switch Non-predictive Cues.

However, error rates did not differ across these two groups of switch trials (predictive cue: 78.9%; nonpredictive cue: 78.8%)

correct_switchprime = data_RT %>% 
  group_by(TrialType, Prime) %>%
  summarize(TotalCorrect = sum(CorrResp))

Inferential statistics

The first claim is that in switch trials, predictive cues lead to statistically significant faster reaction times than nonpredictive cues.

… the performance on switch trials preceded by this cue would be better than on switch trials preceded by the nonpredictive cue. This was indeed the case (mean RT-predictive cue: 819 ms; nonpredictive cue: 871 ms; mean difference = 52 ms, 95% confidence interval, or CI = [19.5, 84.4]), two-tailed paired t(20) = 3.34, p < .01.

# reproduce the above results here

Next, test the second claim.

However, error rates did not differ across these two groups of switch trials (predictive cue: 78.9%; nonpredictive cue: 78.8%), p = .8.

# reproduce the above results here

Step 5: Reflection

Were you able to reproduce the results you attempted to reproduce? If not, what part(s) were you unable to reproduce?

No, I was not able to reproduce a majority of the results that I attempted to reproduce. I was able to reproduce the median accuracy rates (92%) for those in the repeat trials, however I could not reproduce any of the other results and I reached the three hour limit on the assignment without even attempting the inferential statistics tests at the end of the reproduction.

How difficult was it to reproduce your results?

It was extremeley difficult to reproduce any results using these data and the descriptions provided.

What aspects made it difficult? What aspects made it easy?

What made this difficult was that it was unclear which values were being represented in the report. For example, it was unclear if values presented were means or medians and it was also unclear how these values were calculated (averaging each individual’s trials first and then for the whole condition or some other method, etc.). There were also unknown values in the data, for example there were supposed to be two trial conditions - switch and repeat, however there were three values in TrialType (0, 1, and 2) and it was unclear how these were handled by the original authors, so I just removed them although it is unclear if this was the correct way to handle these data. Similarly, there were numerical values coded under “prime” for cues that were not described anywhere in procedure. Also, labels were not that clear, for example there was a variable labeled “CorrResp” and another labeled “RespCorr” and so it was unclear which to use to calculate values. I’m also realizing how important it is to have a thorough understanding of the experimental procedure because I still feel like I don’t have a complete understanding of what they did in the experiment and I think this made it more challenging for me to understand how to work with the data.

Reproducibility Report: Group B Choice 1