# before using tidyverse, we need import libraries
# if the tidyverse is not install, you should install first
library(tidyverse)
library(ez)
library(knitr)
opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
In this case study, we have one full data set from a visual search experiment, in which participants had to discriminate a colour or orientation pop-out target. A typical search display is shown in Figure 1 (an orientation target, tiled left or right). If it was a colour target, the target was either green or red.
Figure 1. A typical pop-out search display
In this experiment, the proportion of color target vs. orientation was systematically manipulated across three sections: 75% vs. 25%, 50% vs. 50%, 25% vs. 75%. The task was to report as quickly and accurately as possible whether a target was color or orientation, using the left and right mouse buttons, respectively. Each trial started with the presentation of a fixation dot for 700–900 ms followed by the stimulus display, which was displayed until the participant made a response.
There were three sections, each consisting of 10 blocks of 40 trials. Twelve participants took part in this experiment. All had normal or corrected-to-normal vision and were naive as to the purpose of the experiment. All participants gave informed consent before the experiment.
The research questions we had in mind are:
All data were pooled together and stored in a R-binary format file (rt_raw.rds). Experimental data are usually stored in Matlab, excel, or text formats. You can read them into R using different packages. For examples, using readMat() from R.matlab package for importing Matlab .mat files; read.csv for reading csv files.
The data can be download here.
For a better portability, I highly recommend you create a new R project to analyze the data. To create a new project, simply click the right up corner in your RStudio, create a new project (from existing folder). In the root folder of your project, you will see a file called .Rproj, this is a project profile, managing your project. Please copy the download file to that folder. Note, if you have multiple data files, and multiple data analysis codes, I recommend you create subfolders, and store those different types of files in separate subfolders. For example, I stored my data files in a subfolder called data.
.rds is a file stored for a single R object. The interface to save and read those objects are: saveRDS() and readRDS(). The raw data in this example is already in .rds format, so let’s read the raw data into workspace using r chunk below:
raw = readRDS('data/rt_raw.rds')
# show the head of the raw data
kable(head(raw))
| BlkType | dimension | color | orientation | position | response | rt | rs | blkNo | sub | outlier | error | tno |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3:1 | Color | red | left | 11 | 3 | 1.2441734 | 0.8037465 | 1 | adrl | TRUE | FALSE | 1 |
| 3:1 | Color | turg | left | 13 | 3 | 0.7895251 | 1.2665842 | 1 | adrl | FALSE | FALSE | 2 |
| 3:1 | Color | turg | left | 15 | 3 | 0.7763107 | 1.2881440 | 1 | adrl | FALSE | FALSE | 3 |
| 3:1 | Color | turg | right | 13 | 3 | 0.7149473 | 1.3987046 | 1 | adrl | FALSE | FALSE | 4 |
| 3:1 | Color | turg | right | 15 | 3 | 0.6372978 | 1.5691253 | 1 | adrl | FALSE | FALSE | 5 |
| 3:1 | Color | red | left | 13 | 3 | 0.6825651 | 1.4650616 | 1 | adrl | FALSE | FALSE | 6 |
Using head(varTable) you can show the structure and data (header) from a table. The above shows us that the raw data have 13 columns with 14400 observations (trials). The variable is also listed on the right side of your Rstudio, where you can open the table by double clicking the variable.
Your task: open the table and inspect the table, and see if you can guess meanings from the column names.
In most cases behavioural raw data store detailed information, which has more information than we want. Here the most relevant columns we will focus on are:
Question: What outlier criteria should we set for a normal response time experiment?
Recall the session of R introduction, Figure 2 illustrates the flow from the book ‘R for Data Science’1:
Figure 2. Data Analysis flow from R for Data Science
Several key concepts:
%>% flows data from one operator to another.We will apply those key concepts in our data analyses below.
The first step of data analysis is to remove those error trials. However, before doing this, we must make sure there is no speed-accuracy trade-off.
Question: Why should we care about the speed-accuracy trade-off (SATO)?
Your answer:
Step 1: Percentage of error trials for individual participants.
raw %>% group_by(sub) %>% # separate for each participant
summarise(merror = mean(error)) %>% # calculate mean error
ggplot(aes(x=sub, y = merror)) + geom_bar(stat = 'identity') # visualize with bar plot
By visual inspection, only one participant has error rate greater than 5%. So the task is relative easy to accomplish.
Tidyverse includes the visualization package ggplot2. You can click RStudio menu - Help - Cheatsheets - Data Visualization with ggplot to download the usuful cheatsheet.
A typical visualization grammar is like this:
ggplot(data = ) + geom_FUN(mapping = aes(
)), stat = , position = ) + + +
geom_FUN defines types of drawing. For example, geom_point draws points, geom_bar bars, and geom_line lines.
<MAPPINGS> defines x, y, group, color for your drawing. If all geom_FUN shares the same mapping, you can move this into the ggplot part. <STAT> indicates what statistics you plan to use for drawing. If it is single value, please use stat = 'identity'.
Now we turn to speed-accuracy trade-off analysis: we compare the RTs from the correct trials to incorrect trials. If the incorrect trials had faster responses, there must be some speed-accuracy trade-off.
raw %>% filter(outlier == FALSE) %>%
group_by(sub, error) %>%
summarise(mrt = mean(rt)) %>%
ggplot(aes(x=sub, y = mrt, group= error, color = error)) + geom_line()
By visual inspection, the error trials look like to have faster responses. This may indicate certain degree of speed-accuracy trade-off. However, this needs further statistical tests. Note that, if the SATOs occur equally in all conditions, SATOs would not affect our conclusions given that we use within-subject full-factorial design. Thus, it is important to do statistical tests on the RTs, and, if necessary, do error analysis across different experimental conditions.
One simple package for doing ANOVA analysis is the ez package. If you haven’t installed this package, install this package first. Several main parameters of the ezANOVA are:
If you have more than one within- or betwee-factors, you need to use operator .() to combine them. For example, if we want to test two within-subject factors dimension and ratio, we need to specify within = .(dimension, ration).
library(ez)
av1 = ezANOVA(raw %>% filter(outlier == FALSE), # data
dv = rt, # dependent variable
wid = sub, # random variable
within = error) #within factors
kable(av1$ANOVA)
| Effect | DFn | DFd | F | p | p<.05 | ges | |
|---|---|---|---|---|---|---|---|
| 2 | error | 1 | 11 | 2.38856 | 0.1504943 | 0.0235189 |
Based on the statistical test we can say no significant evidence of SATO for general discrimination task. But to be sure, we need to further do a condition-wise SATO analysis. Note the difference in the parameter within=.
library(ez)
av2 = ezANOVA(raw %>% filter(outlier == FALSE), # data
dv = rt, # dependent variable
wid = sub, # random variable
within = .(BlkType, error)) #within factors
kable(av2$ANOVA)
| Effect | DFn | DFd | F | p | p<.05 | ges | |
|---|---|---|---|---|---|---|---|
| 2 | BlkType | 2 | 22 | 0.992791 | 0.3865419 | 0.0120681 | |
| 3 | error | 1 | 11 | 1.849378 | 0.2010758 | 0.0149802 | |
| 4 | BlkType:error | 2 | 22 | 1.435543 | 0.2594222 | 0.0103494 |
Given that there is NO SATO, we will remove those error trials for the future analysis. First we calculate mean RT data from the valid trials.
vdata = raw %>% filter(!outlier & !error) # filter outliers and errors
# accomplish the following codes
mrts = vdata %>% group_by(sub, BlkType, dimension) %>%
summarise(mrt = mean(rt))
saveRDS(mrts, file = 'data/mrts.rds')
With the mean RTs, we can now visualise them using ggplot. Please inspect the code, and let me know
# first collapse data across participants
mrts = readRDS('data/mrts.rds')
gmrt = mrts %>% group_by(BlkType, dimension) %>%
summarise(rt = mean(mrt),
n = n(),
se = sd(mrt)/sqrt(n-1))
# now visualize the grant mean rt
gmrt %>% ggplot( aes(x = BlkType, y = rt, group = dimension, color = dimension)) +
geom_line() + # line plot
geom_errorbar(aes(ymin = rt - se, ymax = rt + se), width = 0.2) +
geom_point() +
xlab('Color:Orientation') +
ylab('Mean RTs (secs)') +
theme_classic()
Question: Why should we first average mean RTs for each individual participants, and then pool together again for second averaging? Can we do in one step?
The above pattern certainly shows the interaction, when the frequency of the target (e.g., dimension) is lower, the mean RT is slower. But we want to have an analysis on the frequency of each dimension. Thus we need to create a new column based on BlkType.
Let’s first show the BlkType
print(unique(mrts$BlkType))
## [1] 1:3 1:1 3:1
## Levels: 1:3 < 1:1 < 3:1
print(as.numeric(unique(mrts$BlkType)))
## [1] 1 2 3
It is an ordinal factor variable, encoding color:orientation with three levels. We can use as.numeric() to convert it to numeric. Now we recode the BlkType to frequency.
mrts %>% mutate(frequency = as.numeric(dimension) -1 +
(-1)^(as.numeric(dimension)-1)*0.25 *
as.numeric(BlkType) ) -> mrts
kable(head(mrts))
| sub | BlkType | dimension | mrt | frequency |
|---|---|---|---|---|
| adrl | 1:3 | Color | 0.6332064 | 0.25 |
| adrl | 1:3 | Orientation | 0.5416341 | 0.75 |
| adrl | 1:1 | Color | 0.5728192 | 0.50 |
| adrl | 1:1 | Orientation | 0.6261353 | 0.50 |
| adrl | 3:1 | Color | 0.4979409 | 0.75 |
| adrl | 3:1 | Orientation | 0.6331310 | 0.25 |
Your task: please replot the mean RT figure with frequency as x-axis, and dimension as separate line.
# Your code here
Finally, we should conduct a repeated-measure ANOVA.
Your task: Please discuss with your neighbours and complete the anova analysis.
library(ez)
#ezANOVA(mrts,...)
I highly recommend you this online book from Hadley Wickham, who develops tidyr, ggplot, and many other influential R packages. The book can be found here: http://r4ds.had.co.nz/data-import.html↩