knitr::opts_chunk$set(include = TRUE, echo = TRUE, message = FALSE, fig.width = 4, fig.height = 4, fig.align = 'center', incldue = FALSE)
library(knitr)
library(tidyverse)
## ── Attaching packages ──────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.0.0     ✔ purrr   0.2.5
## ✔ tibble  1.4.2     ✔ dplyr   0.7.6
## ✔ tidyr   0.8.1     ✔ stringr 1.3.1
## ✔ readr   1.1.1     ✔ forcats 0.3.0
## Warning: package 'dplyr' was built under R version 3.5.1
## ── Conflicts ─────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(broom)
library(ez)

In this tutorial, we will go through basic RT analyses for typical psychophysical experiments. The analyses are done with R language. So in the first part, we will quick go through basic analysis flow in R. If you are familiar with R, you can skip the part I. In the second part, we will analyze the data you just collected.

Part I: R Basics

You may have heard about R or RStudio. R is a programming language while RStudio is an interface (integrated development environment, IDE). RStudio helps you organize your codes and data, and managing R packages (collections of special functions). Basically you need both both. They can be downloaded at:

They have been installed in your computer. R packages extend the functionality of R by providing additional functions. Some packages we will use quite often, such as tidyverse, ezANOVA. Particular the tidyverse package, which provides a grammar-like data analysis. Using tidyverse can let you focus more on your data, not on programming itself. There are many free online resources for learning R. I recommend the following three:

  1. Datacamp: Many excellent online interactive tutorials.
  2. An Introduction to Statistical and Data Sciences via R by Chester Ismay and Albert Y. Kim is an excellent book on using R for statistical analysis.
  3. R for Data Science by Hadley Wickham is an excellent book for intermediate or advanced readers, who knows basic R already.

Workflow in Data analysis

To make an efficient data analysis, you need a workflow. Typically we need to do following tasks in data analysis.

  1. Reading and cleaning data
  2. Manipulate data (e.g., basic summaries, linear estimation)
  3. Statistical tests
  4. Visualization

This has become a standard routine in data science.

include_graphics('img/tidy_process.png')

Importing data

Importing experimental data is relative easy. There are several functions available for you to do this. * Reading text or excel data * read.csv() * read.xlsx() from xlsx package * fread() from data.table package * Reading matlab file * readMat() from R.matlab

For example:

dat = read.csv('data/stroop.csv')
head(dat, n = 3)

Data manipulation

Very often we work with data table that organize the data like the above. That is, each trial stores in a row, while each column stores individual variable. This type of table is called tidy data. If your raw data are not in this type of format, you may need to convert to this type of format first.

Experimental raw data contain detail information about the experimental design and observers’ responses. Often we need to manipulate data, such as selecting variables and averaging across participants etc. tidyverse package provides five types of data manipulation:

* `select()`
* `filter()`
* `arrange()`
* `mutate()`
* `summarize()`

These are key functions for data manipulation. You can find a cheatsheet for this from the help - Cheatsheets sub-menu. Importantly, tidyverse also provide a connection operator pipe (%>%) for those functions. That is, you can combine those functions together, like a natural sentence.

knitr::include_graphics('figs/r_dplyr_pipes.pdf')

For example, we want to use ‘dat’ table, filter out those error trials, and then average mean reaction times for individual participants. This analysis flow can be translated into R directly:

dat %>% filter(acc == TRUE) %>% group_by(sub) %>% summarise(mRT = mean(rt)) %>% spread(sub, mRT) %>% kable(., digits = 2)

Visualization

tidyverse includes one visualization package ggplot2. Similar to the data manipulation, ggplot is also a grammar of graphics, which consists of a data set, a set of geoms, and a coordinate system

ggplot(data, aes(x,y)) + geom_point() + geom_smooth(method = 'lm')

Note though it uses the plus + operator, not the pipe %>%. Here are some basic plots:

  1. Bar plot

geom_bar(stat = 'identity', position = 'dodge')

  1. Lines plot

geom_line()

  1. Error bars

geom_errorbar(aes(ymin, ymax))

We will know more in the next part.

Special R

Sometimes you may find it a bit strange. For example, there are three operators for storing values: =, <-, ->. The latter two have direction. To make an array from numbers, you need the concatenate operator c(). For example, c(1,4, 5). R codes can be written in multiple lines, if you have +, %>%, , at the end of previous line. And R codes do not require ‘;’ for ending. In addition, you need to load packages at the very beginning of your code.

Part II: RT analyses

Import experimental raw data

The raw file generated by matlab is a text file with separation of ‘,’. Note each participant has one individual file. The first step is to read into workspace and combine them together. tidyverse package provides a useful function for this, called map_df(a_list, function). That is, you can provide a list of files, and apply the function to each element of the list.

# list of participants
subs = list.files(path = 'search', full.names = TRUE, pattern='*.txt')
raw = map_df(subs, read.csv, header = FALSE)

head(raw)

Now we have the raw data. But we need to make some further preprocessing, such as renaming the column variables, adding participant number, adding accuracy etc.

# rename columns using names()
names(raw) = c('target','setSize','block','resp','rt')
# calculate accuracy
raw$acc = raw$target == raw$resp
# convert target from number to readable factors
raw$target = factor(raw$target, labels = c('present','absent'))
# convert set size from coding index to real items
raw$setSize = raw$setSize*4+4 # 8, 12, 16 items
# convert block from number to readable factors. 
raw$block = factor(raw$block, labels = c('dynamic','static'))
# add subject No. and factorize it
raw$sub = rep(1:length(subs), each = 420)
raw$sub = factor(raw$sub)

# show some data from the head
head(raw)

Error analysis

The first step of data analysis is to remove those error trials. However, before doing this, we must make sure there is no speed-accuracy trade-off.

Question: Why should we care about the speed-accuracy trade-off (SATO)?

Your answer:

We use pipe operator %>% to combine the analyses.

# we use piping ...
# please read the following lines 
# and guess what columns are in the final table msuberrors
raw %>% group_by(sub, setSize, block) %>%
  summarise(merror = 1 - mean(acc)) -> msuberrors

# visualize the errors 
msuberrors %>% ggplot(aes(x = sub, y = merror)) + geom_bar(stat = 'identity', position = position_dodge())  + 
  facet_grid(setSize~block)

By visual inspection, we see huge error rates in the dynamic target-absent condition. This suggests there might be some speed-accuracy trade-off. Thus, we do a repeated-measures ANOVA on the error rates.

anova_err = ezANOVA(msuberrors,
        dv = merror,
        within = .(setSize, block),
        wid = sub)
## Warning: "setSize" will be treated as numeric.
## Warning: There is at least one numeric within variable, therefore aov()
## will be used for computation and no assumption checks will be obtained.
anova_err$ANOVA

Questions: Did error rates differ among set size, block? Is there any interaction?

RT analysis

For RT analysis, we usually only include those correct trials. Thus, first we need to filter out those errors (We did error analysis above already), and then average mean RTs for each condition, each subject. Below is an example output.

# Please start from `raw` data, average correct mean RTs, 
# and store to `msubrts'

msubrts = raw %>% filter(acc == TRUE) %>%
  group_by(sub,block, setSize, target) %>% 
  summarise(mrt = mean(rt))

head(msubrts)

To validate original study (Horowitz & Wolfe, 1998), we visualize the mean RTs as a function of set size (line plot).

# please complete the following codes
msubrts %>% 
  group_by(block, setSize, target) %>% # please fill the code here (not run)
  summarise(mmrt = mean(mrt)) %>%
  ggplot(aes(x = setSize, y = mmrt, 
             color = block, shape = target )) +
  geom_point(size=3) + geom_line()  + facet_wrap(~block)

As we can see that for the ‘target-present’ condition, the slopes for the static and visual displays are similar. But for the target-absent condition, the slope for the static display is clearly steeper than that for the dynamic display.

We use linear regression to estimate the search slopes.

  1. Let’s first build a simple linear regression.
# select one participant and one condition
# Task: please change the following subject name to yours. 
sub1 = msubrts %>% filter(sub == 1 & target == 'present')
# estimate a linear model
lm(mrt ~ setSize, data = sub1)
## 
## Call:
## lm(formula = mrt ~ setSize, data = sub1)
## 
## Coefficients:
## (Intercept)      setSize  
##     0.57807      0.02993

Question: What is the slope of the above linear regression?

  1. Build a function for linear estimation

We have to estimate multiple linear models, so we’d better build a function for this. The linear regression in R is lm(). If you are not familiar with the linear regression, please search in ‘help’.

linear_model <- function(df){
  lm(mrt ~ setSize, data = df)
}
  1. Analysis for all combinations

Now we want to do linear regression for each condition in each individual participants. In a standard program, you need to use loop functions. In tidyverse, you can use groupby() function plus the map() function. The following is an example code how you can get the slopes easily within one pipe line.

msubrts %>% group_by(sub, block, target) %>%
  nest() %>% 
  mutate(model = map(data, linear_model)) %>%
  mutate(glance = map(model, broom::tidy)) %>%
  unnest(glance, .drop = TRUE) -> linear_est

head(linear_est)
  1. Visualize Mean search slopes for the static and dynamic displays

Once we get the linear estimation, we now can do visualization, statistical analysis etc. Let’s first visualize the mean slopes.

linear_est %>% group_by(block) %>%
  filter(term == 'setSize') %>%
  summarise(slope = mean(estimate), n = n(), se = sd(estimate)/sqrt(n-1)) %>%
  ggplot(aes(block, slope, fill = block, color = block)) + 
  geom_bar(stat = 'identity') + 
  geom_errorbar(aes(ymin = slope - se, ymax = slope + se), width = 0.5)

  1. Statistical analysis

One easy way to do repeated-measures ANOVA is to use ez package. In the ez package, you can use the function ezANOVA(). Here is an example how you can do this for the slope comparison:

linear_est %>% filter(term == 'setSize') %>%
  ezANOVA(data = ., 
          dv = estimate, 
          within = .(target, block), 
          wid =sub)
## $ANOVA
##         Effect DFn DFd          F           p p<.05         ges
## 2       target   1   9  0.1539633 0.703906815       0.005363524
## 3        block   1   9 14.1458892 0.004477398     * 0.127172076
## 4 target:block   1   9  9.7402785 0.012299938     * 0.156355355