Intro

This is problem set #2, in which we hope you will practice the visualization package ggplot2, as well as hone your knowledge of the packages tidyr and dplyr.

Sklar et al. (2012) claims evidence for unconscious arithmetic processing. We’re going to do a reanalysis of their Experiment 6, which is the primary piece of evidence for that claim. The data are generously contributed by Asael Sklar.

First let’s set up a few preliminaries.

library(tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag():    dplyr, stats
library(lme4)
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## The following object is masked from 'package:tidyr':
## 
##     expand
sem <- function(x) {sd(x, na.rm=TRUE) / sqrt(length(x))}
ci95 <- function(x) {sem(x) * 1.96}

Data Prep

First read in two data files and subject info. A and B refer to different trial order counterbalances.

subinfo <- read.csv("http://langcog.stanford.edu/sklar_expt6_subinfo_corrected.csv")
d.a <- read.csv("http://langcog.stanford.edu/sklar_expt6a_corrected.csv")
d.b <- read.csv("http://langcog.stanford.edu/sklar_expt6b_corrected.csv")

Gather, tidy iris

Gather these datasets into long form and get rid of the Xs in the headers.

d.a.long <- d.a %>%
  gather(subid,RT,starts_with("X")) %>%
  mutate(subid=as.numeric(gsub("X", "",subid)))

d.b.long <- d.b %>%
  gather(subid,RT, starts_with("X")) %>%
  mutate(subid=as.numeric(gsub("X", "",subid)))

Bind these together. Check out bind_rows.

d <- bind_rows(d.a.long,d.b.long)

Merge these with subject info. You will need to look into merge and its relatives, left_ and right_join. Call this dataframe d, by convention.

d <- d %>%
  left_join(subinfo)
## Joining, by = "subid"

Clean up the factor structure.

d$presentation.time <- factor(d$presentation.time)
levels(d$operand) <- c("addition","subtraction")

Data Analysis Preliminaries

Examine the basic properties of the dataset. First, take a histogram.

d %>% ggplot(aes(x=RT)) +
  geom_histogram() +
  ggthemes::theme_few()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 237 rows containing non-finite values (stat_bin).

Challenge question: what is the sample rate of the input device they are using to gather RTs?

Sklar et al. did two manipulation checks. Subjective - asking participants whether they saw the primes - and objective - asking them to report the parity of the primes (even or odd) to find out if they could actually read the primes when they tried. Examine both the unconscious and conscious manipulation checks. What do you see? Are they related to one another?

subinfo %>%
  ggplot(aes(x=subjective.test)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

subinfo %>%
  ggplot(aes(x=objective.test)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

cor.test(subinfo$subjective.test, subinfo$objective.test)
## 
##  Pearson's product-moment correlation
## 
## data:  subinfo$subjective.test and subinfo$objective.test
## t = 4.4873, df = 40, p-value = 5.966e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3333494 0.7505802
## sample estimates:
##       cor 
## 0.5786542
subinfo %>%
  ggplot((aes(x=subjective.test,y=objective.test))) +
  geom_point() +
  geom_smooth(method='lm')+
  ggthemes::theme_few()

OK, let’s turn back to the measure and implement Sklar et al.’s exclusion criterion. You need to have said you couldn’t see (subjective test) and also be not significantly above chance on the objective test (< .6 correct). Call your new data frame ds.

ds <- d %>% filter(subjective.test==0 & objective.test<.6)

Sklar et al.’s analysis

Sklar et al. show a plot of a “facilitation effect” - the amount faster you are for prime-congruent naming compared with prime-incongruent naming. They then show plot this difference score for the subtraction condition and for the two prime times they tested. Try to reproduce this analysis.

HINT: first take averages within subjects, then compute your error bars across participants, using the sem function (defined above).

ave<- ds %>% 
  group_by(subid,congruent,operand, presentation.time) %>%
  summarise(meanRT=mean(RT,na.rm = TRUE)) %>%
  group_by(subid,presentation.time,operand, add=FALSE) %>%
  summarise(RTdiff=mean(meanRT[congruent=="no"]-meanRT[congruent=="yes"],na.rm=TRUE)) %>% group_by(presentation.time,operand,add=FALSE) %>%
  summarise(meanDiff= mean(RTdiff), sem = sem(RTdiff))

Now plot this summary, giving more or less the bar plot that Sklar et al. gave (though I would keep operation as a variable here. Make sure you get some error bars on there (e.g. geom_errorbar or geom_linerange).

ave %>% ggplot(aes(x=presentation.time,y=meanDiff)) +
  geom_bar(stat="identity",position="dodge",aes(fill=operand)) +
  geom_linerange(aes(ymin=meanDiff-sem,ymax=meanDiff+sem)) +
  facet_grid(.~ operand) +
  ggthemes::theme_few()

What do you see here? How close is it to what Sklar et al. report? Do the error bars match? How do you interpret these data?

Challenge problem: verify Sklar et al.’s claim about the relationship between RT and the objective manipulation check.

sum <- d %>%
  group_by(objective.test,operand,presentation.time,congruent,subid) %>%
  summarise(rt = mean(RT,na.rm=TRUE)) %>%
  group_by(objective.test,operand,presentation.time,subid,add=FALSE)%>%
  summarise(rtdiff = mean(rt[congruent=="no"]-rt[congruent=="yes"],na.rm=TRUE))

ggplot(sum, aes(x=objective.test,y=rtdiff,color=presentation.time)) +
  geom_point() +
  facet_grid(.~ operand) + 
  geom_smooth(method="lm") +
  ggthemes::theme_few()

Your own analysis

Show us what you would do with these data, operating from first principles. What’s the fairest plot showing a test of Sklar et al.’s original hypothesis??????????

sss <- d %>%
  group_by(congruent,operand,subid,presentation.time) %>%
  summarise(rt = mean(RT,na.rm=TRUE))

sss %>% ggplot(aes(x=presentation.time,y=rt,color=congruent)) +
  geom_boxplot() +
  geom_jitter(width=.4) +
  facet_grid(.~ operand)+
  ggthemes::theme_few() 

sss <- d %>%
  group_by(congruent,operand,subid,presentation.time) %>%
  summarise(rt = mean(RT,na.rm=TRUE))%>%
  group_by(congruent,operand,presentation.time,add=FALSE)%>%
  summarise(meanrt = mean(rt), sem = sem(rt), ci = ci95(rt))

sss %>% ggplot(aes(x=presentation.time,y=meanrt,fill=congruent)) +
  geom_bar(stat="identity",position="dodge") +
  geom_linerange(aes(ymin=meanrt-sem,ymax=meanrt+sem),position=position_dodge(.9)) +
  facet_grid(.~ operand) +
  ggthemes::theme_few() 

############
sum2 <- d %>%
  group_by(subjective.test,operand,presentation.time,congruent,subid) %>%
  summarise(rt = mean(RT,na.rm=TRUE)) %>%
  group_by(subjective.test,operand,presentation.time,subid,add=FALSE)%>%
  summarise(rtdiff = mean(rt[congruent=="no"]-rt[congruent=="yes"],na.rm=TRUE))

ggplot(sum2, aes(as.numeric(subjective.test),rtdiff,color=presentation.time)) +
  geom_point() +
  facet_grid(.~ operand) +
  geom_smooth(method="lm") +
  ggthemes::theme_few()

Challenge problem: Do you find any statistical support for Sklar et al.’s findings?

summary(lmer(RT ~ congruent*presentation.time*operand +(congruent|subid), data=d))
## Linear mixed model fit by REML ['lmerMod']
## Formula: 
## RT ~ congruent * presentation.time * operand + (congruent | subid)
##    Data: d
## 
## REML criterion at convergence: 76888.9
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -5.0771 -0.6144 -0.0978  0.5159 14.1845 
## 
## Random effects:
##  Groups   Name         Variance  Std.Dev. Corr 
##  subid    (Intercept)   5213.330  72.203       
##           congruentyes     1.746   1.321  -1.00
##  Residual              13134.147 114.604       
## Number of obs: 6231, groups:  subid, 42
## 
## Fixed effects:
##                                                         Estimate
## (Intercept)                                           701.881801
## congruentyes                                            7.945612
## presentation.time2000                                 -49.975343
## operandsubtraction                                    -11.461118
## congruentyes:presentation.time2000                     -8.801151
## congruentyes:operandsubtraction                       -14.326757
## presentation.time2000:operandsubtraction                0.007592
## congruentyes:presentation.time2000:operandsubtraction   0.405341
##                                                       Std. Error t value
## (Intercept)                                            16.667266   42.11
## congruentyes                                            5.848933    1.36
## presentation.time2000                                  23.024223   -2.17
## operandsubtraction                                      5.958232   -1.92
## congruentyes:presentation.time2000                      8.074887   -1.09
## congruentyes:operandsubtraction                         8.430542   -1.70
## presentation.time2000:operandsubtraction                8.222365    0.00
## congruentyes:presentation.time2000:operandsubtraction  11.639271    0.03
## 
## Correlation of Fixed Effects:
##             (Intr) cngrnt pr.2000 oprnds cn:.2000 cngrn: p.2000:
## congruentys -0.225                                              
## prsntt.2000 -0.724  0.163                                       
## oprndsbtrct -0.172  0.491  0.125                                
## cngrn:.2000  0.163 -0.724 -0.224  -0.356                        
## cngrntys:pr  0.122 -0.692 -0.088  -0.707  0.501                 
## prsnt.2000:  0.125 -0.356 -0.172  -0.725  0.489    0.512        
## cngr:.2000: -0.088  0.501  0.121   0.512 -0.692   -0.724 -0.706