Intro

This is problem set #2, in which we hope you will practice the visualization package ggplot2, as well as hone your knowledge of the packages tidyr and dplyr.

Sklar et al. (2012) claims evidence for unconscious arithmetic processing. We’re going to do a reanalysis of their Experiment 6, which is the primary piece of evidence for that claim. The data are generously contributed by Asael Sklar.

First let’s set up a few preliminaries.

rm(list=ls())
library(ggplot2)
library(tidyr)
library(dplyr)
library(lme4)

sem <- function(x) {sd(x, na.rm=TRUE) / sqrt(length(x))}
ci95 <- function(x) {sem(x) * 1.96}

Data Prep

First read in two data files and subject info. A and B refer to different trial order counterbalances.

subinfo <- read.csv("http://langcog.stanford.edu/sklar_expt6_subinfo_corrected.csv")
d.a <- read.csv("http://langcog.stanford.edu/sklar_expt6a_corrected.csv")
d.b <- read.csv("http://langcog.stanford.edu/sklar_expt6b_corrected.csv")

Gather these datasets into long form and get rid of the Xs in the headers.

d.a.wide <- d.a %>%
  gather(subid, measurement, X1:X21) %>%
  mutate(subid=as.numeric(gsub("X", "", subid)))
d.b.wide <- d.b %>%
  gather(subid, measurement, X22:X42) %>%
  mutate(subid=as.numeric(gsub("X", "", subid)))

Bind these together. Check out bind_rows.

d.wide <- bind_rows(d.a.wide, d.b.wide)

Merge these with subject info. You will need to look into merge and its relatives, left_ and right_join. Call this dataframe d, by convention.

## Joining by: "subid"

Clean up the factor structure.

d$presentation.time <- factor(d$presentation.time)
levels(d$operand) <- c("addition","subtraction")

Data Analysis Preliminaries

Examine the basic properties of the dataset. First, take a histogram.

qplot(d$measurement, binwidth=40)
## Warning: Removed 237 rows containing non-finite values (stat_bin).

Here we see a typical distribution for reaction times, but notice that there are some really short reaction times. Perhaps thos are guesses or misfires? We also see that reaction times are centered around 600.

ggplot(d, aes(x=measurement)) +
  geom_histogram(stat="bin", binwidth=40) +
  facet_wrap(operand ~ congruent)
## Warning: Removed 237 rows containing non-finite values (stat_bin).

The distribution also seems consistent across operand and congruency.

ggplot(d, aes(y=measurement, x=factor(subid))) +
  geom_boxplot()
## Warning: Removed 237 rows containing non-finite values (stat_boxplot).

These boxplots allow us to see the distribution of reaction times across participants. The means seem consistent with some variability across participants.

ggplot(d, aes(y=measurement, x=distance, fill=factor(subid))) +
  geom_point(position="dodge", stat="identity")
## Warning: Width not defined. Set with `position_dodge(width = ?)`
## Warning: Removed 237 rows containing missing values (geom_point).

There doesn’t seem to be a dfference in reaction times across distances.

prelim <- d %>%
  group_by(congruent, operand) %>%
  summarise(mean=mean(measurement, na.rm=TRUE))

ggplot(prelim, aes(x=congruent, y=mean)) +
  geom_bar(stat="identity", position="dodge") +
  facet_wrap(~operand)

The means also show no difference between addition and subtraction reaction times.

Challenge question: what is the sample rate of the input device they are using to gather RTs?

Sklar et al. did two manipulation checks. Subjective - asking participants whether they saw the primes - and objective - asking them to report the parity of the primes (even or odd) to find out if they could actually read the primes when they tried. Examine both the unconscious and conscious manipulation checks. What do you see? Are they related to one another?

ggplot(d, aes(x=prime.result, y=measurement)) +
  geom_point(aes(color=objective.test)) +
  facet_wrap(~subjective.test)
## Warning: Removed 237 rows containing missing values (geom_point).

There doesn’t appear to be a relationship between the target primes for the subjective test. On the other hand, there might be a relationship between the subjective and objective test.

boxplot(objective.test ~ subjective.test, data = d)

It appears that there might be a relationship between the subjective and objective tests. People who saw the primes (subjective) performed better on the objective test than those who did not see the primes.

OK, let’s turn back to the measure and implement Sklar et al.’s exclusion criterion. You need to have said you couldn’t see (subjective test) and also be not significantly above chance on the objective test (< .6 correct). Call your new data frame ds.

ds <- d %>%
  filter(subjective.test == 0 &
         objective.test < 0.6)

Sklar et al.’s analysis

Sklar et al. show a plot of a “facilitation effect” - the amount faster you are for prime-congruent naming compared with prime-incongruent naming. They then show plot this difference score for the subtraction condition and for the two prime times they tested. Try to reproduce this analysis.

HINT: first take averages within subjects, then compute your error bars across participants, using the sem function (defined above).

ms <- ds %>%
  group_by(operand, congruent, presentation.time, subid) %>%
  summarise(mean = mean(measurement, na.rm=TRUE)) %>%
  spread(congruent, mean) %>%
  mutate(facilitation=no-yes) %>%
  group_by(operand, presentation.time) %>%
  summarise(ci = sem(facilitation), facilitation=mean(facilitation))

Now plot this summary, giving more or less the bar plot that Sklar et al. gave (though I would keep operation as a variable here. Make sure you get some error bars on there (e.g. geom_errorbar or geom_linerange).

ggplot(ms, aes(x=operand, y=facilitation,
               fill=presentation.time)) + 
    geom_bar(position="dodge", stat="identity") +
    geom_errorbar(aes(ymin=facilitation-ci, ymax=facilitation+ci),
                  width=.25,
                  position=position_dodge(.9))

What do you see here? How close is it to what Sklar et al. report? Do the error bars match? How do you interpret these data?

The error bars are an order of magnitude off (they used half a standard deviation instead of an entire standard deviation above and below the mean). In the paper, they showed no significant effect of priming for the addition case, though it is interesting to see that it might even be a negative effect for the shorter presentation time.

Challenge problem: verify Sklar et al.’s claim about the relationship between RT and the objective manipulation check.

Your own analysis

Show us what you would do with these data, operating from first principles. What’s the fairest plot showing a test of Sklar et al.’s original hypothesis?

ms <- d %>%
  group_by(operand, congruent, presentation.time, subid) %>%
  summarise(mean = mean(measurement, na.rm=TRUE)) %>%
  spread(congruent, mean) %>%
  mutate(facilitation=no-yes) %>%
  group_by(operand, presentation.time) %>%
  summarise(ci = sem(facilitation), facilitation=mean(facilitation))

ggplot(ms, aes(x=operand, y=facilitation,
               fill=presentation.time)) + 
    geom_bar(position="dodge", stat="identity") +
    geom_errorbar(aes(ymin=facilitation-ci, ymax=facilitation+ci),
                  width=.25,
                  position=position_dodge(.9))

Without filtering any data, there is a negative priming effect in the subtraction case.

ds <- d %>%
  filter(subjective.test == 0 &
         objective.test < 0.5)

ms <- ds %>%
  group_by(operand, congruent, presentation.time, subid) %>%
  summarise(mean = mean(measurement, na.rm=TRUE)) %>%
  spread(congruent, mean) %>%
  mutate(facilitation=no-yes) %>%
  group_by(operand, presentation.time) %>%
  summarise(ci = sem(facilitation), facilitation=mean(facilitation))

ggplot(ms, aes(x=operand, y=facilitation,
               fill=presentation.time)) + 
    geom_bar(position="dodge", stat="identity") +
    geom_errorbar(aes(ymin=facilitation-ci, ymax=facilitation+ci),
                  width=.25,
                  position=position_dodge(.9))

It seems strange that the filter for “less than chance” for the objective test is 0.5. Updating that, we see much larger error bars though still showing a priming effect (and the addition condition both has a negative priming effect).

ds <- d %>%
  filter(subjective.test == 0 &
         objective.test < 0.6)

ms <- d %>%
  group_by(operand, prime.result, congruent, presentation.time, subid) %>%
  summarise(mean = mean(measurement, na.rm=TRUE)) %>%
  spread(congruent, mean) %>%
  mutate(facilitation=no-yes) %>%
  group_by(operand, prime.result, presentation.time) %>%
  summarise(ci = sem(facilitation), facilitation=mean(facilitation))

ggplot(ms, aes(x=operand, y=facilitation,
               fill=presentation.time)) + 
    geom_bar(position="dodge", stat="identity") +
    geom_errorbar(aes(ymin=facilitation-ci, ymax=facilitation+ci),
                  width=.25,
                  position=position_dodge(.9)) +
    facet_wrap(operand ~ prime.result)
## Warning: Removed 1 rows containing missing values (geom_errorbar).

Repeating Sklar et al’s analysis faceted by the result of the prime reveals even more mystery. The results are highly variable depending on the result (the results reproduce only for 0 in the subtraction case).

ggplot(ds, aes(x=measurement, fill=presentation.time)) +
  geom_histogram(stat="bin", binwidth=20) +
  facet_wrap(operand~prime.result)
## Warning: Removed 87 rows containing non-finite values (stat_bin).

In the subtraction case, there are simply more results where the result is 0, and much fewer for 5 and 6. It looks like there might be other confounds here. It is still unclear though, why there is an inverse effect where the result is 3.

Challenge problem: Do you find any statistical support for Sklar et al.’s findings?