# before using tidyverse, we need import libraries
# if the tidyverse is not install, you should install first
library(tidyverse)
library(ez)
library(knitr)
opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)

Pop out visual search with discrimination task

In this case study, we have one full data set from a visual search experiment, in which participants had to discriminate a colour or orientation pop-out target. A typical search display is shown in Figure 1 (an orientation target, tiled left or right). If it was a colour target, the target was either green or red.

Figure 1. A typical pop-out search display

Figure 1. A typical pop-out search display

In this experiment, the proportion of color target vs. orientation was systematically manipulated across three sections: 75% vs. 25%, 50% vs. 50%, 25% vs. 75%. The task was to report as quickly and accurately as possible whether a target was color or orientation, using the left and right mouse buttons, respectively. Each trial started with the presentation of a fixation dot for 700–900 ms followed by the stimulus display, which was displayed until the participant made a response.

There were three sections, each consisting of 10 blocks of 40 trials. Twelve participants took part in this experiment. All had normal or corrected-to-normal vision and were naive as to the purpose of the experiment. All participants gave informed consent before the experiment.

The research questions we had in mind are:

  1. How does the target ratio (color vs. orientation) in one block influence search performance?
  2. Does target ratio interact with the target dimension (color/orientation)?
  3. How does the target appeared in a previous trial influence on target detection in the current trial?

Data structure

All data were pooled together and stored in a R-binary format file (rt_raw.rds). Experimental data are usually stored in Matlab, excel, or text formats. You can read them into R using different packages. For examples, using readMat() from R.matlab package for importing Matlab .mat files; read.csv for reading csv files.

The data can be download here.

R project

For a better portability, I highly recommend you create a new R project to analyze the data. To create a new project, simply click the right up corner in your RStudio, create a new project (from existing folder). In the root folder of your project, you will see a file called .Rproj, this is a project profile, managing your project. Please copy the download file to that folder. Note, if you have multiple data files, and multiple data analysis codes, I recommend you create subfolders, and store those different types of files in separate subfolders. For example, I stored my data files in a subfolder called data.

.rds is a file stored for a single R object. The interface to save and read those objects are: saveRDS() and readRDS(). The raw data in this example is already in .rds format, so let’s read the raw data into workspace using r chunk below:

raw = readRDS('data/rt_raw.rds')
# show the head of the raw data
kable(head(raw))
BlkType dimension color orientation position response rt rs blkNo sub outlier error tno
3:1 Color red left 11 3 1.2441734 0.8037465 1 adrl TRUE FALSE 1
3:1 Color turg left 13 3 0.7895251 1.2665842 1 adrl FALSE FALSE 2
3:1 Color turg left 15 3 0.7763107 1.2881440 1 adrl FALSE FALSE 3
3:1 Color turg right 13 3 0.7149473 1.3987046 1 adrl FALSE FALSE 4
3:1 Color turg right 15 3 0.6372978 1.5691253 1 adrl FALSE FALSE 5
3:1 Color red left 13 3 0.6825651 1.4650616 1 adrl FALSE FALSE 6

Using head(varTable) you can show the structure and data (header) from a table. The above shows us that the raw data have 13 columns with 14400 observations (trials). The variable is also listed on the right side of your Rstudio, where you can open the table by double clicking the variable.

Your task: open the table and inspect the table, and see if you can guess meanings from the column names.

In most cases behavioural raw data store detailed information, which has more information than we want. Here the most relevant columns we will focus on are:

  1. Dimension: defining target property
  2. BlkType: types of block, color:orientation ratio of targets
  3. rt: reaction time (in seconds)
  4. sub: subject number
  5. error: error response
  6. outlier: outliers based on the RTs and the first trial of each block. You can also calculate outliers yourself.

Question: What outlier criteria should we set for a normal response time experiment?

Workflow with Tidyverse

Recall the session of R introduction, Figure 2 illustrates the flow from the book ‘R for Data Science’1:

Figure 2. Data Analysis flow from R for Data Science

Figure 2. Data Analysis flow from R for Data Science

Several key concepts:

  1. The pip %>% flows data from one operator to another.
  2. Function simplifies redundant codes
  3. Concentrate on data manipulation, not on programming

We will apply those key concepts in our data analyses below.

Error analysis

The first step of data analysis is to remove those error trials. However, before doing this, we must make sure there is no speed-accuracy trade-off.

Question: Why should we care about the speed-accuracy trade-off (SATO)?

Your answer:

Step 1: Percentage of error trials for individual participants.

raw %>% group_by(sub) %>% # separate for each participant
  summarise(merror = mean(error)) %>% # calculate mean error
  ggplot(aes(x=sub, y = merror)) + geom_bar(stat = 'identity') # visualize with bar plot

By visual inspection, only one participant has error rate greater than 5%. So the task is relative easy to accomplish.

Visualization

Tidyverse includes the visualization package ggplot2. You can click RStudio menu - Help - Cheatsheets - Data Visualization with ggplot to download the usuful cheatsheet.

A typical visualization grammar is like this:

ggplot(data = ) + geom_FUN(mapping = aes()), stat = , position = ) + + +

geom_FUN defines types of drawing. For example, geom_point draws points, geom_bar bars, and geom_line lines.

<MAPPINGS> defines x, y, group, color for your drawing. If all geom_FUN shares the same mapping, you can move this into the ggplot part. <STAT> indicates what statistics you plan to use for drawing. If it is single value, please use stat = 'identity'.

Now we turn to speed-accuracy trade-off analysis: we compare the RTs from the correct trials to incorrect trials. If the incorrect trials had faster responses, there must be some speed-accuracy trade-off.

raw %>% filter(outlier == FALSE) %>%
  group_by(sub, error) %>%
  summarise(mrt = mean(rt)) %>% 
  ggplot(aes(x=sub, y = mrt, group= error, color = error)) + geom_line()

By visual inspection, the error trials look like to have faster responses. This may indicate certain degree of speed-accuracy trade-off. However, this needs further statistical tests. Note that, if the SATOs occur equally in all conditions, SATOs would not affect our conclusions given that we use within-subject full-factorial design. Thus, it is important to do statistical tests on the RTs, and, if necessary, do error analysis across different experimental conditions.

One simple package for doing ANOVA analysis is the ez package. If you haven’t installed this package, install this package first. Several main parameters of the ezANOVA are:

  1. dv - dependent variable
  2. wid - random variable. For within-subject design, we often put subject No. as random variable.
  3. within - within-subject factors
  4. between - between-subject factors

If you have more than one within- or betwee-factors, you need to use operator .() to combine them. For example, if we want to test two within-subject factors dimension and ratio, we need to specify within = .(dimension, ration).

library(ez)
av1 = ezANOVA(raw %>% filter(outlier == FALSE), # data
        dv = rt, # dependent variable
        wid = sub, # random variable
        within = error) #within factors
kable(av1$ANOVA)
Effect DFn DFd F p p<.05 ges
2 error 1 11 2.38856 0.1504943 0.0235189

Based on the statistical test we can say no significant evidence of SATO for general discrimination task. But to be sure, we need to further do a condition-wise SATO analysis. Note the difference in the parameter within=.

library(ez)
av2 = ezANOVA(raw %>% filter(outlier == FALSE), # data
        dv = rt, # dependent variable
        wid = sub, # random variable
        within = .(BlkType, error)) #within factors
kable(av2$ANOVA)
Effect DFn DFd F p p<.05 ges
2 BlkType 2 22 0.992791 0.3865419 0.0120681
3 error 1 11 1.849378 0.2010758 0.0149802
4 BlkType:error 2 22 1.435543 0.2594222 0.0103494

Mean RTs

Given that there is NO SATO, we will remove those error trials for the future analysis. First we calculate mean RT data from the valid trials.

vdata = raw %>% filter(!outlier & !error) # filter outliers and errors
# accomplish the following codes

mrts = vdata %>% group_by(sub, BlkType, dimension) %>%
     summarise(mrt = mean(rt))
saveRDS(mrts, file = 'data/mrts.rds')

With the mean RTs, we can now visualise them using ggplot. Please inspect the code, and let me know

# first collapse data across participants
mrts = readRDS('data/mrts.rds')
gmrt = mrts %>% group_by(BlkType, dimension) %>%
  summarise(rt = mean(mrt), 
            n = n(), 
            se = sd(mrt)/sqrt(n-1)) 
# now visualize the grant mean rt
gmrt %>%  ggplot( aes(x = BlkType, y = rt, group = dimension, color = dimension)) + 
  geom_line() + # line plot
  geom_errorbar(aes(ymin = rt - se, ymax = rt + se), width = 0.2) +
  geom_point() + 
  xlab('Color:Orientation') + 
  ylab('Mean RTs (secs)') + 
  theme_classic()

Question: Why should we first average mean RTs for each individual participants, and then pool together again for second averaging? Can we do in one step?

The above pattern certainly shows the interaction, when the frequency of the target (e.g., dimension) is lower, the mean RT is slower. But we want to have an analysis on the frequency of each dimension. Thus we need to create a new column based on BlkType.

Let’s first show the BlkType

print(unique(mrts$BlkType))
## [1] 1:3 1:1 3:1
## Levels: 1:3 < 1:1 < 3:1
print(as.numeric(unique(mrts$BlkType)))
## [1] 1 2 3

It is an ordinal factor variable, encoding color:orientation with three levels. We can use as.numeric() to convert it to numeric. Now we recode the BlkType to frequency.

mrts %>% mutate(frequency = as.numeric(dimension) -1 +
                  (-1)^(as.numeric(dimension)-1)*0.25 * 
                  as.numeric(BlkType) ) -> mrts
kable(head(mrts))
sub BlkType dimension mrt frequency
adrl 1:3 Color 0.6332064 0.25
adrl 1:3 Orientation 0.5416341 0.75
adrl 1:1 Color 0.5728192 0.50
adrl 1:1 Orientation 0.6261353 0.50
adrl 3:1 Color 0.4979409 0.75
adrl 3:1 Orientation 0.6331310 0.25

Your task: please replot the mean RT figure with frequency as x-axis, and dimension as separate line.

# Your code here

Finally, we should conduct a repeated-measure ANOVA.

Your task: Please discuss with your neighbours and complete the anova analysis.

library(ez)
#ezANOVA(mrts,...)

  1. I highly recommend you this online book from Hadley Wickham, who develops tidyr, ggplot, and many other influential R packages. The book can be found here: http://r4ds.had.co.nz/data-import.html