1 Experiment details

Data were collected on Amazon mTurk
Validity = 60% Valid, 20% invalid-same, 20% invalid-different
80 target-present trials, 12 catch trials
Horizontal objects only

2 Set up R environment

library(tidyverse)
library(ggplot2)
library(ggpubr)
library(plyr)
library(magick)
library(rstatix)
library(reshape2)

Make sure you’re in the right directory. Set the R working drectory to the main experiment directory, which is where this markdown is saved, along with any supporting material and raw data, which are stored as a subdirectory.

setwd("/Users/adambarnas/Box/MeridianCB")

3 Format & manipulate raw data files

3.1 Read-in datafiles

Read in the individual subject files (saved automatically on the server as csv files).

tbl_all <- list.files(path = "./data_v2/", pattern = "*.csv", full.names = T) %>% 
    map_df(~read_csv(.))
tbl_all <- data.frame(tbl_all)
#head(tbl_all,10)

Confirm the number of subjects and make sure the sample sizes reflects the number of data files in the data subdirectory.

nrow(tbl_all %>% distinct(workerId,.keep_all = FALSE))

## [1] 88

Next, define trial conditions by breaking apart the name of the image, given by objs_image column.

tbl_all <- tbl_all %>% 
separate(objs_image,into=c('rectangle_orientation','cue_loc','change_loc','validity','display_num','change_type'))
#head(tbl_all,10)

For clarity, rename all the variable values that are now given by the validity variable.

tbl_all <- tbl_all %>% mutate(validity=recode_factor(validity, `V`="valid_same", `C`="catch_catch", `IS`="invalid_same", `ID`="invalid_different", `objects`="NA"))

Let’s also assign the trials to bins based on the trial number. The practice trials (the first 4 for each subject) will be labeled “filler” since they are not factored into any analyses.

tbl_all$bin = "filler"
tbl_all[which(tbl_all$trial_number %in% c(5:27)), "bin"] = "block_1"
tbl_all[which(tbl_all$trial_number %in% c(28:50)), "bin"] = "block_2"
tbl_all[which(tbl_all$trial_number %in% c(51:73)), "bin"] = "block_3"
tbl_all[which(tbl_all$trial_number %in% c(74:96)), "bin"] = "block_4"

Before proceding, this table contains the number of trials for each individual. There were 80 target-present trials that were 60% valid (48 trials) and 40% invalid (16 trials for each type). There were also 12 catch trials. The numbers of each trial type were split evenly among the 4 cue locations. The last variable, “sum”, is the total number of trials saved for each participant.

tbl_all_counts <- tbl_all %>%
  group_by(workerId,validity) %>%
  filter(validity=='valid_same' | validity=='invalid_same' | validity=='invalid_different' | validity=='catch_catch') %>%
  dplyr::summarize(counts = n()) %>%
  spread(validity,counts)
tbl_all_counts$sum = rowSums(tbl_all_counts[,c(-1)], na.rm = TRUE)
#head(tbl_all_counts,10)

These counts can also be binned over time (based on sequential trial number)

tbl_all_counts_bin <- tbl_all %>%
  group_by(workerId,validity,bin) %>%
  filter(validity=='valid_same' | validity=='invalid_same' | validity=='invalid_different' | validity=='catch_catch') %>%
  dplyr::summarize(counts = n()) %>%
  spread(validity,counts)
tbl_all_counts_bin$sum = rowSums(tbl_all_counts_bin[,c(-1:-2)], na.rm = TRUE)
#head(tbl_all_counts_bin,10)

Calculate the number of main trials, excuding catch trials.

tbl_all_counts_no_catch <- tbl_all %>%
  group_by(workerId,validity) %>%
  filter(validity=='valid_same' | validity=='invalid_same' | validity=='invalid_different') %>%
  dplyr::summarize(counts = n()) %>%
  spread(validity,counts)
tbl_all_counts_no_catch$sum = rowSums(tbl_all_counts_no_catch[,c(-1)], na.rm = TRUE)
#head(tbl_all_counts_no_catch,10)

Again, calculate the number of main trials, excuding catch trials, over time

tbl_all_counts_no_catch_bin <- tbl_all %>%
  group_by(workerId,validity,bin) %>%
  filter(validity=='valid_same' | validity=='invalid_same' | validity=='invalid_different') %>%
  dplyr::summarize(counts = n()) %>%
  spread(validity,counts)
tbl_all_counts_no_catch_bin$sum = rowSums(tbl_all_counts_no_catch_bin[,c(-1:-2)], na.rm = TRUE)
#head(tbl_all_counts_no_catch_bin,10)

Get the number of catch trials.

tbl_all_counts_catch <- tbl_all %>%
  group_by(workerId,validity) %>%
  filter(validity=='catch_catch') %>%
  dplyr::summarize(counts = n()) %>%
  spread(validity,counts)
#head(tbl_all_counts_catch,10)

Similarly, get the number of catch trials over time

tbl_all_counts_catch_bin <- tbl_all %>%
  group_by(workerId,validity,bin) %>%
  filter(validity=='catch_catch') %>%
  dplyr::summarize(counts = n()) %>%
  spread(validity,counts)
#head(tbl_all_counts_catch_bin,10)

The data are loaded. Next, look at the quality of the data by examining the accuracy of the subjects’ clicks on the changing object and whether any RTs are outliers.

3.2 Analyze accuracy

The changing object appeared in four possible locations. A sample image is provided below (Note: (1) the thin border around each obect was not visible to participants and is provided for representational purposes, and (2) only one object appeared at a time.). The following X and Y coordinates (in pixels) were obtained from the Octave/PTB script used to generate the stimuli:

UL: xmin = 22, ymin = 22, xmax, 99, ymax, 99

UR: xmin = 424, ymin = 22, xmax = 501, ymax = 99

LL: xmin = 22, ymin = 424, xmax = 99, ymax = 501

LR: xmin = 424, ymin = 424, xmax = 501, ymax = 501

img <- image_read("/Users/adambarnas/Box/MeridianCB/Object_locs.jpg")
img

The coordinates correspond to the upper-left (UL) and lower-right (LR) points of an object [UL X coor (xmin), UL Y coor (ymin), LR X coor (xmax), and LR Y coor (ymax)]. For instance, the object that appeared in the upperleft had coordinates [22, 22, 99, 99]. Note: these ranges correspond to the size of the image file (a 77x77 pixel square).

Trials are labeled 1 if the X and Y coordinate data fell inside the ranges listed above. Trials are labeled 0 if the X and Y coordinate data fell outside these ranges, and are discarded because this indicates that the subject was not accurate (i.e., one or both of their click coordinates was not within the object boundaries).

tbl_all <- tbl_all %>%
  filter(change_loc == "UL" | change_loc == "UR" | change_loc == "LL" | change_loc == "LR")

tbl_all$click_ACC = "filler"

for (i in 1:length(tbl_all$workerId)){
  if (tbl_all$change_loc[i] == "UL"){
    if ((22 <= tbl_all$x[i] & 99 >= tbl_all$x[i]) && (22 <= tbl_all$y[i] & 99 >= tbl_all$y[i])){
      tbl_all$click_ACC[i] = 1
    } else {
      tbl_all$click_ACC[i] = 0
    }
  } else if (tbl_all$change_loc[i] == "UR"){
    if ((424 <= tbl_all$x[i] & 501 >= tbl_all$x[i]) && (22 <= tbl_all$y[i] & 99 >= tbl_all$y[i])){
      tbl_all$click_ACC[i] = 1
    } else {
      tbl_all$click_ACC[i] = 0
    }
  } else if (tbl_all$change_loc[i] == "LL"){
    if ((22 <= tbl_all$x[i] & 99 >= tbl_all$x[i]) && (424 <= tbl_all$y[i] & 501 >= tbl_all$y[i])){
      tbl_all$click_ACC[i] = 1
    } else {
      tbl_all$click_ACC[i] = 0
    }
  } else if (tbl_all$change_loc[i] == "LR"){
    if ((424 <= tbl_all$x[i] & 501 >= tbl_all$x[i]) && (424 <= tbl_all$y[i] & 501 >= tbl_all$y[i])){
      tbl_all$click_ACC[i] = 1
    } else {
      tbl_all$click_ACC[i] = 0
    }
  }
}

3.2.1 Catch trials

Sum the number of good catch trials (1) to get the total number of accurate catch trials per subject.

tbl_all_catch_acc_counts <- tbl_all %>%
  filter(validity=='catch_catch')
tbl_all_catch_acc_counts <- tbl_all_catch_acc_counts %>%
  group_by(workerId,click_ACC) %>%
  dplyr::summarize(counts = n()) %>%
  spread(click_ACC,counts)
#head(tbl_all_catch_acc_counts,10)

Divide the number of accurate catch trials (1) by the number of total catch trials for each participant. The resulting value will be the subjects catch trial rate. Again, subjects with a rate less than 75% (or, .75) will be discarded.

tbl_all_catch_acc_counts[is.na(tbl_all_catch_acc_counts)] <- 0
colnames(tbl_all_catch_acc_counts) <- c("workerId", "inacc", "acc")
tbl_all_catch_acc_rate <- (tbl_all_catch_acc_counts$acc / tbl_all_counts_catch$catch_catch)
tbl_all_catch_acc_rate <- cbind.data.frame(tbl_all_counts_catch[,1], tbl_all_catch_acc_rate)
colnames(tbl_all_catch_acc_rate) <- c("workerId", "acc")
#head(tbl_all_catch_acc_rate,10)

Here is a plot the group’s overall accuracy on the catch trials.

tbl_all_catch_acc_rate %>% 
  ggbarplot(y = "acc", ylab = "Accuracy", fill = "#f7a800", color = "#f7a800", add = "mean_se", ylim = c(0, 1), xlab = "Group", width = 0.5, label = TRUE, lab.nb.digits = 2, lab.vjust = -2.2, title = "All Catch Accuracy")

Let’s also take a look at each individual subject’s catch trial performance rate.

tbl_all_catch_acc_rate %>% 
  ggbarplot(x = "workerId", y = "acc", ylab = "Accuracy", fill = "#f7a800", color = "#f7a800", ylim = c(0, 1), title = "Individual Catch Accuracy Rate", sort.val = c("asc"), font.xtickslab = 6) + rotate_x_text() + geom_hline(yintercept = .75, linetype = 2)

This chunk will categorize subjects with “good” or “bad” catch rates and will create two new dataframes: one for good subjects with a catch rate greater than or equal to .75 and one for bad subjects with a catch rate less than .75.

tbl_all_catch_acc_rate$cat = "filler"
for (i in 1:length(tbl_all_catch_acc_rate$workerId)){
  if (tbl_all_catch_acc_rate$acc[i] >= 0.75){
    tbl_all_catch_acc_rate$cat[i] = "Good"
  } else {
    tbl_all_catch_acc_rate$cat[i] = "Bad"
  }
}

tbl_good_catch_acc_rate <- tbl_all_catch_acc_rate %>%
  filter(acc >= 0.75)
tbl_bad_catch_acc_rate <- tbl_all_catch_acc_rate %>%
  filter(acc < 0.75)

tbl_all_catch_acc_rate %>%
  ggbarplot(x = "cat", y = "acc", ylab = "Accuracy", xlab = "Accuracy Group", add = "mean_se", fill = "#f7a800", color = "#f7a800", ylim = c(0, 1), label = TRUE, lab.nb.digits = 2, lab.vjust = c(-2, -1.2), title = "Catch Trial Accuracy for Bad and Good Subjects", sort.val = c("asc"))

3.2.2 Main trials

For the rest of the analyses, focus on the participants with good catch rate performance. Select the subjects with good catch trial rates from the original tbl_all.

tbl_good_catch_acc_all_main_acc <- tbl_all[(tbl_all$workerId %in% tbl_good_catch_acc_rate$workerId),]
#head(tbl_good,10)

Verify subject count.

nrow(tbl_good_catch_acc_all_main_acc %>% distinct(workerId,.keep_all = FALSE))

## [1] 58

Here, now, is table containing the number of trials for each individual after excluding main trials based on accuracy. Again, there were 80 target-present trials that were 60% valid (48 trials) and 40% invalid (16 trials for each type).

tbl_good_catch_acc_all_main_acc_counts <- tbl_good_catch_acc_all_main_acc %>%
  group_by(workerId,validity) %>%
  filter((validity=='valid_same' | validity=='invalid_same' | validity=='invalid_different') & click_ACC == 1) %>%
  dplyr::summarize(counts = n()) %>%
  spread(validity,counts)
tbl_good_catch_acc_all_main_acc_counts$sum = rowSums(tbl_good_catch_acc_all_main_acc_counts[,c(-1)], na.rm = TRUE)
#head(tbl_good_catch_acc_all_main_acc_counts,10)

Same table, but binned.

tbl_good_catch_acc_all_main_acc_counts_bin <- tbl_good_catch_acc_all_main_acc %>%
  group_by(workerId,validity,bin) %>%
  filter((validity=='valid_same' | validity=='invalid_same' | validity=='invalid_different') & click_ACC == 1) %>%
  dplyr::summarize(counts = n()) %>%
  spread(validity,counts)
tbl_good_catch_acc_all_main_acc_counts_bin$sum = rowSums(tbl_good_catch_acc_all_main_acc_counts_bin[,c(-1:-2)], na.rm = TRUE)
#head(tbl_good_catch_acc_all_main_acc_counts_bin,10)

Some subjects may have no surviving data for a particular condition. These subjects should be tossed because they have an unequal number of conditions compared to the other subjects.

tbl_good_catch_acc_all_main_acc_NA_conditions_removed <- tbl_good_catch_acc_all_main_acc_counts %>%
  filter(valid_same!="NA" & invalid_same!="NA" & invalid_different!="NA")
#head(tbl_good_catch_acc_all_main_acc_NA_conditions_removed,10)

Same table, but binned.

tbl_good_catch_acc_all_main_acc_NA_conditions_removed_bin <- tbl_good_catch_acc_all_main_acc_counts_bin[(tbl_good_catch_acc_all_main_acc_counts_bin$workerId %in% tbl_good_catch_acc_all_main_acc_NA_conditions_removed$workerId),]
#head(tbl_good_catch_acc_all_main_acc_NA_conditions_removed_bin,10)

Now, let’s get rid of any subjects with NA from tbl_good_catch_acc_all_main_acc.

tbl_good_catch_acc_all_main_acc_NA_subjs_removed <- tbl_good_catch_acc_all_main_acc[(tbl_good_catch_acc_all_main_acc$workerId %in% tbl_good_catch_acc_all_main_acc_NA_conditions_removed$workerId),]
#head(tbl_good_catch_acc_all_main_acc_NA_subjs_removed,10)

And let’s check the number of subjects we are now working with.

nrow(tbl_good_catch_acc_all_main_acc_NA_subjs_removed %>% distinct(workerId,.keep_all = FALSE))

## [1] 58

After dropping subjects based on catch trial performance and for accuracy on the main trials (dropping any additional subjects with unequal conditions), get the original number of trials for the relevant subjects.

tbl_good_catch_acc_all_main_acc_NA_subjs_removed_counts <- tbl_all_counts_no_catch[(tbl_all_counts_no_catch$workerId %in% tbl_good_catch_acc_all_main_acc_NA_conditions_removed$workerId),]
#head(tbl_good_catch_acc_all_main_acc_NA_subjs_removed_counts,10)

Plot the overall accuracy at the group level (collasped across workerId and condition).

tbl_overall_good_acc <- (tbl_good_catch_acc_all_main_acc_NA_conditions_removed$sum / tbl_good_catch_acc_all_main_acc_NA_subjs_removed_counts$sum)
tbl_overall_good_acc <- cbind.data.frame(tbl_good_catch_acc_all_main_acc_NA_subjs_removed_counts[,1], tbl_overall_good_acc)
colnames(tbl_overall_good_acc) <- c("workerId", "ACC")
tbl_overall_good_acc %>% 
  ggbarplot(y = "ACC", ylab = "Accuracy", fill = "#f7a800", color = "#f7a800", add = "mean_se", ylim = c(0, 1), xlab = "Group", width = 0.5, label = TRUE, lab.nb.digits = 2, lab.vjust = -1.4, title = "Main Trial Accuracy")

Look at the overall accuracy at the group level (collasped across workerId and condition) over time.

tbl_good_no_NA_bin <- tbl_good_catch_acc_all_main_acc_NA_subjs_removed %>%
  group_by(workerId,validity,bin) %>%
  filter(validity=='valid_same' | validity=='invalid_same' | validity=='invalid_different') %>%
  dplyr::summarize(counts = n()) %>%
  spread(validity,counts)
tbl_good_no_NA_bin$sum = rowSums(tbl_good_no_NA_bin[,c(-1:-2)], na.rm = TRUE)
#head(tbl_good_no_NA_bin,10)

tbl_overall_good_acc_bin <- (tbl_good_catch_acc_all_main_acc_NA_conditions_removed_bin$sum / tbl_good_no_NA_bin$sum)
tbl_overall_good_acc_bin <- cbind.data.frame(tbl_good_no_NA_bin[,1:2], tbl_overall_good_acc_bin)
colnames(tbl_overall_good_acc_bin) <- c("workerId", "bin", "ACC")
tbl_overall_good_acc_bin %>% 
  ggbarplot(y = "ACC", x = "bin", ylab = "Accuracy", fill = "#f7a800", color = "#f7a800", add = "mean_se", ylim = c(0, 1), xlab = " Bin", width = 0.5, label = TRUE, lab.nb.digits = 2, lab.vjust = c(-1.4, -1.4, -1.4, -1.6), title = "Main Trial Accuracy Over Time", na.rm = TRUE)

Here are some descriptive and inferential statistics for the effect of accuracy over time.

tbl_overall_good_acc_bin %>%
  group_by(bin) %>%
  get_summary_stats(ACC, type = "mean_se")

## # A tibble: 4 x 5
##   bin     variable     n  mean    se
##   <chr>   <chr>    <dbl> <dbl> <dbl>
## 1 block_1 ACC         58 0.902 0.018
## 2 block_2 ACC         58 0.889 0.023
## 3 block_3 ACC         58 0.886 0.026
## 4 block_4 ACC         58 0.877 0.028

tbl_overall_good_acc_bin %>%
  group_by(bin) %>%
  identify_outliers(ACC)

## # A tibble: 26 x 5
##    bin     workerId         ACC is.outlier is.extreme
##    <chr>   <chr>          <dbl> <lgl>      <lgl>     
##  1 block_1 A1PUWQYUQRGCO  0.429 TRUE       FALSE     
##  2 block_1 A2CHD0TNWHFW1R 0.211 TRUE       TRUE      
##  3 block_2 A1PUWQYUQRGCO  0.4   TRUE       TRUE      
##  4 block_2 A2CHD0TNWHFW1R 0.25  TRUE       TRUE      
##  5 block_2 A3CASN6JG7104  0.182 TRUE       TRUE      
##  6 block_2 A3E8NUUS90EWXW 0.7   TRUE       FALSE     
##  7 block_2 A3GK90X2QOFR53 0.75  TRUE       FALSE     
##  8 block_2 A3K2ZXAFZCHYZI 0.667 TRUE       FALSE     
##  9 block_2 A3S4M1GQAMPFZB 0.682 TRUE       FALSE     
## 10 block_2 A3SYY5R44RAATE 0.588 TRUE       TRUE      
## # … with 16 more rows

res.aov <- anova_test(data = tbl_overall_good_acc_bin, dv = ACC, wid = workerId, within = bin)
get_anova_table(res.aov, correction = "none")

## ANOVA Table (type III tests)
## 
##   Effect DFn DFd     F    p p<.05   ges
## 1    bin   3 171 0.757 0.52       0.002

pwc <- tbl_overall_good_acc_bin %>%
  pairwise_t_test(
    ACC ~ bin, paired = TRUE,
    p.adjust.method = "bonferroni"
    )
pwc

## # A tibble: 6 x 10
##   .y.   group1  group2     n1    n2 statistic    df     p p.adj p.adj.signif
## * <chr> <chr>   <chr>   <int> <int>     <dbl> <dbl> <dbl> <dbl> <chr>       
## 1 ACC   block_1 block_2    58    58     0.771    57 0.444     1 ns          
## 2 ACC   block_1 block_3    58    58     0.742    57 0.461     1 ns          
## 3 ACC   block_1 block_4    58    58     1.05     57 0.296     1 ns          
## 4 ACC   block_2 block_3    58    58     0.260    57 0.796     1 ns          
## 5 ACC   block_2 block_4    58    58     0.917    57 0.363     1 ns          
## 6 ACC   block_3 block_4    58    58     0.870    57 0.388     1 ns

Look at the overall accuracy for the group by validity (valid, invalid-same etc.).

tbl_overall_good_acc_cond <- (tbl_good_catch_acc_all_main_acc_NA_conditions_removed[-1] / tbl_good_catch_acc_all_main_acc_NA_subjs_removed_counts[-1])
tbl_overall_good_acc_cond <- cbind.data.frame(tbl_good_catch_acc_all_main_acc_NA_subjs_removed_counts[,1], tbl_overall_good_acc_cond)
tbl_overall_good_acc_cond <- gather(tbl_overall_good_acc_cond, validity, acc, valid_same:invalid_different, factor_key=TRUE)
tbl_overall_good_acc_cond %>%   
  ggbarplot(x = "validity", y = "acc", ylab = "Accuracy", fill = "validity" , color = "validity", palette = c("#0d2240", "#00a8e1", "#f7a800", "#E31818", "#dfdddc"), add = "mean_se", ylim = c(0, 1), na.rm = TRUE, label = TRUE, lab.nb.digits = 2, lab.vjust = c(-1.6, -1.6, -1.6), title = "Main Trial Accuracy By Validity", xlab = "Validity")

Here are some descriptive and inferential statistics for the effect of accuracy by validity.

tbl_overall_good_acc_cond %>%
  group_by(validity) %>%
  get_summary_stats(acc, type = "mean_sd")

## # A tibble: 3 x 5
##   validity          variable     n  mean    sd
##   <fct>             <chr>    <dbl> <dbl> <dbl>
## 1 valid_same        acc         58 0.904 0.163
## 2 invalid_same      acc         58 0.863 0.199
## 3 invalid_different acc         58 0.864 0.193

tbl_overall_good_acc_cond %>%
  group_by(validity) %>%
  identify_outliers(acc)

## # A tibble: 20 x 6
##    validity          workerId         sum    acc is.outlier is.extreme
##    <fct>             <chr>          <dbl>  <dbl> <lgl>      <lgl>     
##  1 valid_same        A1PUWQYUQRGCO  0.425 0.438  TRUE       TRUE      
##  2 valid_same        A2CHD0TNWHFW1R 0.225 0.312  TRUE       TRUE      
##  3 valid_same        A3CASN6JG7104  0.35  0.292  TRUE       TRUE      
##  4 valid_same        A3S4M1GQAMPFZB 0.6   0.667  TRUE       TRUE      
##  5 valid_same        A3SYY5R44RAATE 0.45  0.417  TRUE       TRUE      
##  6 valid_same        A4W9APAHFWVLO  0.738 0.688  TRUE       TRUE      
##  7 invalid_same      A1PUWQYUQRGCO  0.425 0.438  TRUE       FALSE     
##  8 invalid_same      A2CHD0TNWHFW1R 0.225 0.0625 TRUE       TRUE      
##  9 invalid_same      A3CASN6JG7104  0.35  0.438  TRUE       FALSE     
## 10 invalid_same      A3E8NUUS90EWXW 0.638 0.312  TRUE       FALSE     
## 11 invalid_same      A3S4M1GQAMPFZB 0.6   0.438  TRUE       FALSE     
## 12 invalid_same      A3SYY5R44RAATE 0.45  0.5    TRUE       FALSE     
## 13 invalid_different A1PUWQYUQRGCO  0.425 0.375  TRUE       TRUE      
## 14 invalid_different A2CHD0TNWHFW1R 0.225 0.125  TRUE       TRUE      
## 15 invalid_different A3CASN6JG7104  0.35  0.438  TRUE       TRUE      
## 16 invalid_different A3E8NUUS90EWXW 0.638 0.312  TRUE       TRUE      
## 17 invalid_different A3GK90X2QOFR53 0.825 0.625  TRUE       FALSE     
## 18 invalid_different A3K2ZXAFZCHYZI 0.725 0.562  TRUE       FALSE     
## 19 invalid_different A3S4M1GQAMPFZB 0.6   0.562  TRUE       FALSE     
## 20 invalid_different A3SYY5R44RAATE 0.45  0.5    TRUE       FALSE

res.aov <- anova_test(data = tbl_overall_good_acc_cond, dv = acc, wid = workerId, within = validity)
get_anova_table(res.aov, correction = "none")

## ANOVA Table (type III tests)
## 
##     Effect DFn DFd     F     p p<.05   ges
## 1 validity   2 114 6.168 0.003     * 0.011

pwc <- tbl_overall_good_acc_cond %>%
  pairwise_t_test(
    acc ~ validity, paired = TRUE,
    p.adjust.method = "bonferroni"
    )
pwc

## # A tibble: 3 x 10
##   .y.   group1    group2       n1    n2 statistic    df     p p.adj p.adj.signif
## * <chr> <chr>     <chr>     <int> <int>     <dbl> <dbl> <dbl> <dbl> <chr>       
## 1 acc   valid_sa… invalid_…    58    58    2.76      57 0.008 0.023 *           
## 2 acc   valid_sa… invalid_…    58    58    2.87      57 0.006 0.017 *           
## 3 acc   invalid_… invalid_…    58    58   -0.0987    57 0.922 1     ns

Third, we can look at the accuracy for each individual subject.

tbl_overall_good_acc %>% 
  ggbarplot(x = "workerId", y = "ACC", ylab = "Accuracy", fill = "#f7a800", color = "#f7a800", ylim = c(0, 1), title = "Individual Accuracy", sort.val = c("asc"), font.xtickslab = 8) + rotate_x_text() + geom_hline(yintercept = .75, linetype = 2)

Remove subjects with less than 75% main trial accuracy

tbl_overall_good_acc_rate <- tbl_overall_good_acc %>%
  filter(ACC >= 0.75)
tbl_overall_bad_acc_rate <- tbl_overall_good_acc %>%
  filter(ACC < 0.75)

tbl_good_catch_acc_all_main_acc_NA_subjs_removed <- tbl_good_catch_acc_all_main_acc_NA_subjs_removed[(tbl_good_catch_acc_all_main_acc_NA_subjs_removed$workerId %in% tbl_overall_good_acc_rate$workerId),]

nrow(data.frame(tbl_good_catch_acc_all_main_acc_NA_subjs_removed %>% distinct(workerId,.keep_all = FALSE)))

## [1] 50

3.3 Remove outlier trials

Next, data are inspected for RT outliers. RTs that are more than 3 SDs from the subject mean will be removed.

tbl_good_catch_acc_all_main_acc_NA_subjs_removed <- tbl_good_catch_acc_all_main_acc_NA_subjs_removed %>%
  filter(validity!='catch_catch')

Let’s get the number of trials.

tbl_good_catch_acc_all_main_acc_NA_subjs_removed_counts <- tbl_good_catch_acc_all_main_acc_NA_subjs_removed %>%
  group_by(workerId,validity) %>%
  dplyr::summarize(counts = n()) %>%
  spread(validity,counts)
tbl_good_catch_acc_all_main_acc_NA_subjs_removed_counts$sum = rowSums(tbl_good_catch_acc_all_main_acc_NA_subjs_removed_counts[,c(-1)], na.rm = TRUE)
#head(tbl_good_catch_acc_all_main_acc_NA_subjs_removed_counts,10)

This bit of code will process RTs by coding them as outliers (again, by more than 3 SDs). Two additional columns are added to the data table. First, an “outliers” column is added that labels an RT as an outlier or not (0 = not an outlier, 1 = an outlier less than 3 SDs, 2 = an outlier greater than 3 SDs). Second, a “removed_RT” column is added that contains non-outlier RTs.

Note: code can be changed to allow for replacement of outliers with the cutoff values.

correct.trials <- tbl_good_catch_acc_all_main_acc_NA_subjs_removed[tbl_good_catch_acc_all_main_acc_NA_subjs_removed$click_ACC == 1,]

tbl_good_catch_acc_all_main_acc_NA_subjs_removed <- ddply(correct.trials, .(workerId), function(x){
  m <- mean(x$rt)
  s <- sd(x$rt)
  upper <- m + 3*s #change 3 with another number to increase or decrease cutoff criteria
  lower <- m - 3*s #change 3 with another number to increase or decrease cutoff criteria
  
  x$outliers <- 0
  x$outliers[x$rt > upper] <- 2
  x$outliers[x$rt < lower] <- 1
  x$removed_RT <- x$rt
  x$removed_RT[x$rt > upper]<- NA #change NA with upper to replace an outlier with the upper cutoff
  x$removed_RT[x$rt < lower]<- NA #change NA with lower to replace an outlier with the lower cutoff
  
  x
})
#head(tbl_good_catch_acc_all_main_acc_NA_subjs_removed,10)

Get trial counts.

tbl_good_catch_acc_all_main_acc_NA_subjs_removed_counts <- tbl_good_catch_acc_all_main_acc_NA_subjs_removed %>%
  group_by(workerId,validity) %>%
  filter(validity=='valid_same' | validity=='invalid_same' | validity=='invalid_different') %>%
  dplyr::summarize(counts = n()) %>%
  spread(validity,counts)
tbl_good_catch_acc_all_main_acc_NA_subjs_removed_counts$sum = rowSums(tbl_good_catch_acc_all_main_acc_NA_subjs_removed_counts[,c(-1)], na.rm = TRUE)
#head(tbl_good_catch_acc_all_main_acc_NA_subjs_removed_counts,10)

Next, let’s completely toss out the outlier trials.

tbl_good_catch_acc_all_main_acc_NA_subjs_removed_3_SDs_removed <- tbl_good_catch_acc_all_main_acc_NA_subjs_removed[!is.na(tbl_good_catch_acc_all_main_acc_NA_subjs_removed$removed_RT),]
#head(tbl_good_catch_acc_all_main_acc_NA_subjs_removed,10)

Here is another table containing the number of trials for each individual after excluding trials based on accuracy AND outlier RTs. Again, there were 80 target-present trials that were 60% valid (48 trials) and 20% invalid (16 trials for each type).

tbl_good_catch_acc_all_main_acc_NA_subjs_removed_3_SDs_removed_counts <- tbl_good_catch_acc_all_main_acc_NA_subjs_removed_3_SDs_removed %>%
  group_by(workerId,validity) %>%
  filter(validity=='valid_same' | validity=='invalid_same' | validity=='invalid_different') %>%
  dplyr::summarize(counts = n()) %>%
  spread(validity,counts)
tbl_good_catch_acc_all_main_acc_NA_subjs_removed_3_SDs_removed_counts$sum = rowSums(tbl_good_catch_acc_all_main_acc_NA_subjs_removed_3_SDs_removed_counts[,c(-1)], na.rm = TRUE)
#head(tbl_good_catch_acc_all_main_acc_NA_subjs_removed_3_SDs_removed_counts,10)

What was the percentage of outlier RTs that were removed?

tbl_rts_removed_count <- tbl_good_catch_acc_all_main_acc_NA_subjs_removed_counts[-1] - tbl_good_catch_acc_all_main_acc_NA_subjs_removed_3_SDs_removed_counts[-1]
per_RTs_removed <- (sum(tbl_rts_removed_count$sum) / sum(tbl_good_catch_acc_all_main_acc_NA_subjs_removed_counts$sum)) * 100
per_RTs_removed

## [1] 1.636316

4 Analyze data

4.1 Group level

4.1.1 Some summary statistics

Let’s again confirm how many subjects we’re working with:

nrow(data.frame(tbl_good_catch_acc_all_main_acc_NA_subjs_removed_3_SDs_removed %>% distinct(workerId,.keep_all = FALSE)))

## [1] 50

good_subjs <- data.frame(tbl_good_catch_acc_all_main_acc_NA_subjs_removed_3_SDs_removed %>% distinct(workerId,.keep_all = FALSE))

Next, we want to get RTs to look for the object-based change blindness effects. The same group_by function can be used here, now we’re getting mean RT (removing NA trials) for each subject and condition (and we keep the data in long form for the subsequent statistical analysis).

tbl_final <- tbl_good_catch_acc_all_main_acc_NA_subjs_removed_3_SDs_removed %>%
  group_by(workerId,validity) %>%
  filter(validity=='valid_same' | validity=='invalid_same' | validity=='invalid_different') %>%
  dplyr::summarize(mean_rt = mean(removed_RT, na.rm=TRUE))

tbl_final <- data.frame(tbl_final)
#tbl_final %>% head

4.1.2 Repeated-measures ANOVA

We can take these subject mean RTs and pipe them directly into a repeated-measures ANOVA. The invalid_far trials are excluded since they are typically not included in object-based attention tests.

tbl_final %>%
  group_by(validity) %>%
  get_summary_stats(mean_rt, type = "mean_sd")

## # A tibble: 3 x 5
##   validity          variable     n  mean    sd
##   <fct>             <chr>    <dbl> <dbl> <dbl>
## 1 valid_same        mean_rt     50 5037.  796.
## 2 invalid_same      mean_rt     50 5539.  977.
## 3 invalid_different mean_rt     50 5561.  779.

tbl_final %>%
  group_by(validity) %>%
  identify_outliers(mean_rt)

## # A tibble: 7 x 5
##   validity          workerId       mean_rt is.outlier is.extreme
##   <fct>             <chr>            <dbl> <lgl>      <lgl>     
## 1 valid_same        A16Z47IWMSXYHP   8330. TRUE       TRUE      
## 2 valid_same        A220ZCMBT1YMMU   7289. TRUE       FALSE     
## 3 valid_same        A2FGKKWP33DFWS   6499. TRUE       FALSE     
## 4 valid_same        A3R818WN41K12K   6563. TRUE       FALSE     
## 5 invalid_same      A16Z47IWMSXYHP  10943. TRUE       TRUE      
## 6 invalid_same      A220ZCMBT1YMMU   7153. TRUE       FALSE     
## 7 invalid_different A220ZCMBT1YMMU   7975. TRUE       FALSE

res.aov <- anova_test(data = tbl_final, dv = mean_rt, wid = workerId, within = validity)
get_anova_table(res.aov, correction = "none")

## ANOVA Table (type III tests)
## 
##     Effect DFn DFd      F        p p<.05   ges
## 1 validity   2  98 24.163 2.95e-09     * 0.075

4.1.3 Plot the results

tbl_final %>% 
  ggbarplot(x = "validity", y = "mean_rt", ylab = "Mean RT (ms)", fill = "validity" , color = "validity", palette = c("#0d2240", "#00a8e1", "#f7a800"), add = "mean_se", ylim = c(0, 7000))

4.1.4 Pairwise comparisons

First, the analysis of space-based attentional selection

tbl_final %>% 
  filter(validity == "valid_same" | validity == "invalid_same") %>%
  with(t.test(mean_rt~validity,paired=TRUE))

## 
##  Paired t-test
## 
## data:  mean_rt by validity
## t = -6.2908, df = 49, p-value = 8.323e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -662.1096 -341.5089
## sample estimates:
## mean of the differences 
##               -501.8093

Second, the analysis of object-based attentional selection

tbl_final %>% 
  filter(validity == "invalid_different" | validity == "invalid_same") %>%
  with(t.test(mean_rt~validity,paired=TRUE))

## 
##  Paired t-test
## 
## data:  mean_rt by validity
## t = -0.24565, df = 49, p-value = 0.807
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -202.9306  158.7217
## sample estimates:
## mean of the differences 
##               -22.10443

4.2 Individual subjects

Of course, there is value in the group level effect for the population as a whole, but that result may not accurately represent the behavior of the sample or population. Bootstrapping individual subject data and looking at individual subject variation helps to augment the conclusions of group level effects. Object-based effects are notoriously “flakey” and have always been interpreted at the group level. For instance, the magnitude of the group level effects has been reported to be just a few ms (13 ms in the original Egly paper), and in some cases revsered. Here, I was interested in whether the majority or minority of subjects exhibited a significant object-based effect in the predicted direction (invalid-same faster than invalid-different), and the number of participants that exhibited a significant reversal of the effect (same-object cost, invalid-different faster than invalid-same).

4.2.1 Bootstrap data

This chunk of code is resampling data randomly with replacement for each cue and target location condition (4 cue locations and 3 change locations for a total of 12 conditions). The data in each condition of the bootstrapped data sets were resampled with replacement from the corresponding condition in the original data set so that the bootstrapped data sets contained the same number of trials in each condition as the original data. The data from 1 subject was used to generate 5 data sets (though, typically, you generate thousands more but I am experiencing computing issues with my laptop). Then, you basically treat each bootstrapped data set like the original group - average across validity and compute space-based effects (invalid-same minus valid) and object-based effects (invalid-different minus invalid-same).

tbl_resample <- tbl_good_catch_acc_all_main_acc_NA_subjs_removed_3_SDs_removed
tbl_resample$cue_change_validity <- paste(tbl_resample$cue_loc,tbl_resample$change_loc,tbl_resample$validity,sep="-")
tbl_resample_counts <- tbl_resample %>%
  group_by(workerId,cue_change_validity) %>%
  dplyr::summarize(counts = n()) %>%
  spread(cue_change_validity,counts)

vars <- unique(tbl_resample$cue_change_validity)

bs_df = data.frame()

for(part in good_subjs$workerId){
  for(var in vars){
    numtrials <- nrow(tbl_resample[tbl_resample$workerId == part & tbl_resample$cue_change_validity == var, ])
    RTs <- tbl_resample[tbl_resample$workerId == part & tbl_resample$cue_change_validity == var, "removed_RT"]
    for(iter in 1:5){
      r=sample(RTs, numtrials, replace=TRUE)
      bs_df <- rbind(bs_df, data.frame(workerId = part, iteration = iter, condition = var, mean = mean(r)))
    }
  }
}

bs_df_clean <- na.omit(bs_df)

bs_df_clean <- bs_df_clean %>% 
separate(condition,into=c('cue_loc','change_loc','validity'), sep = "([\\-])")

bs_df_means <- bs_df_clean %>% 
  group_by(workerId, iteration, validity) %>%
  dplyr::summarize(mean_RT = mean(mean, na.rm=TRUE))

bs_df_effects <- spread(bs_df_means, validity, mean_RT)
bs_df_effects$space_effect <- (bs_df_effects$invalid_same - bs_df_effects$valid_same)
bs_df_effects$object_effect <- (bs_df_effects$invalid_different - bs_df_effects$invalid_same)

4.2.2 Plot results

Here, the mean space-based and object-based effects for each bootstrapped data set is plotted with a 95% confidence interval. So, each subject has one space-based and one object-based effect. Results for individual subjects are significant when the 95% confidence interval does not cross 0.

Note: There are 2 plots for each effect. Ideally, they would be the ggerrorplots but ordered like the ggbarplots. I can not find a way to order the ggerrorplots by x value or to remove the bar in the ggbarplots, but I am still playing around with the plots. Additionally, the workerIds would not be included.

bs_df_effects %>% 
  ggbarplot(x = "workerId", y = "space_effect", ylab = "Mean Space-based Effect (ms)", fill = "#f7a800" , add = "mean_ci", sort.val = c("asc"), font.ytickslab = 4, orientation = "horiz", title = "Space-based Cueing Effects") + geom_hline(yintercept = 0, linetype = 1, color = "red", size=1)

bs_df_effects %>% 
  ggerrorplot(x = "workerId", y = "space_effect", ylab = "Mean Space-based Effect (ms)", desc_stat = "mean_ci", font.ytickslab = 4, orientation = "horiz", title = "Space-based Cueing Effects", size = 0.25) + geom_hline(yintercept = 0, linetype = 1, color = "red", size=1)

bs_df_effects %>% 
  ggbarplot(x = "workerId", y = "object_effect", ylab = "Mean Object-based Effect (ms)", fill = "#f7a800" , add = "mean_ci", sort.val = c("asc"), font.ytickslab = 4, orientation = "horiz", title = "Object-based Cueing Effects") + geom_hline(yintercept = 0, linetype = 1, color = "red", size=1)

bs_df_effects %>% 
  ggerrorplot(x = "workerId", y = "object_effect", ylab = "Mean Object-based Effect (ms)", desc_stat = "mean_ci", font.ytickslab = 4, orientation = "horiz", title = "Object-based Cueing Effects", size = 0.25) + geom_hline(yintercept = 0, linetype = 1, color = "red", size=1)

4.3 Remove fringe subject

Based on the plots above, subject A16Z47IWMSXYHP is a clear lone fringe subject. I just want to see what happens when this subject is removed.

4.3.1 Some summary statistics

tbl_final_subj_removed <- tbl_final %>% 
  filter(workerId != "A16Z47IWMSXYHP")
#tbl_final_subj_removed %>% head

4.3.2 Repeated-measures ANOVA