This is problem set #2, in which we hope you will practice the visualization package ggplot2, as well as hone your knowledge of the packages tidyr and dplyr.
Sklar et al. (2012) claims evidence for unconscious arithmetic processing. We’re going to do a reanalysis of their Experiment 6, which is the primary piece of evidence for that claim. The data are generously contributed by Asael Sklar.
First let’s set up a few preliminaries.
library(tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag(): dplyr, stats
library(lme4)
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
## The following object is masked from 'package:tidyr':
##
## expand
sem <- function(x) {sd(x, na.rm=TRUE) / sqrt(length(x))}
ci95 <- function(x) {sem(x) * 1.96}
First read in two data files and subject info. A and B refer to different trial order counterbalances.
subinfo <- read.csv("http://langcog.stanford.edu/sklar_expt6_subinfo_corrected.csv")
d.a <- read.csv("http://langcog.stanford.edu/sklar_expt6a_corrected.csv")
d.b <- read.csv("http://langcog.stanford.edu/sklar_expt6b_corrected.csv")
Gather, tidy iris
Gather these datasets into long form and get rid of the Xs in the headers.
d.a.long <- d.a %>%
gather(subid,RT,starts_with("X")) %>%
mutate(subid=as.numeric(gsub("X", "",subid)))
d.b.long <- d.b %>%
gather(subid,RT, starts_with("X")) %>%
mutate(subid=as.numeric(gsub("X", "",subid)))
Bind these together. Check out bind_rows.
d <- bind_rows(d.a.long,d.b.long)
Merge these with subject info. You will need to look into merge and its relatives, left_ and right_join. Call this dataframe d, by convention.
d <- d %>%
left_join(subinfo)
## Joining, by = "subid"
Clean up the factor structure.
d$presentation.time <- factor(d$presentation.time)
levels(d$operand) <- c("addition","subtraction")
Examine the basic properties of the dataset. First, take a histogram.
d %>% ggplot(aes(x=RT)) +
geom_histogram() +
ggthemes::theme_few()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 237 rows containing non-finite values (stat_bin).
Challenge question: what is the sample rate of the input device they are using to gather RTs?
Sklar et al. did two manipulation checks. Subjective - asking participants whether they saw the primes - and objective - asking them to report the parity of the primes (even or odd) to find out if they could actually read the primes when they tried. Examine both the unconscious and conscious manipulation checks. What do you see? Are they related to one another?
subinfo %>%
ggplot(aes(x=subjective.test)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
subinfo %>%
ggplot(aes(x=objective.test)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
cor.test(subinfo$subjective.test, subinfo$objective.test)
##
## Pearson's product-moment correlation
##
## data: subinfo$subjective.test and subinfo$objective.test
## t = 4.4873, df = 40, p-value = 5.966e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3333494 0.7505802
## sample estimates:
## cor
## 0.5786542
subinfo %>%
ggplot((aes(x=subjective.test,y=objective.test))) +
geom_point() +
geom_smooth(method='lm')+
ggthemes::theme_few()
OK, let’s turn back to the measure and implement Sklar et al.’s exclusion criterion. You need to have said you couldn’t see (subjective test) and also be not significantly above chance on the objective test (< .6 correct). Call your new data frame ds.
ds <- d %>% filter(subjective.test==0 & objective.test<.6)
Sklar et al. show a plot of a “facilitation effect” - the amount faster you are for prime-congruent naming compared with prime-incongruent naming. They then show plot this difference score for the subtraction condition and for the two prime times they tested. Try to reproduce this analysis.
HINT: first take averages within subjects, then compute your error bars across participants, using the sem function (defined above).
ave<- ds %>%
group_by(subid,congruent,operand, presentation.time) %>%
summarise(meanRT=mean(RT,na.rm = TRUE)) %>%
group_by(subid,presentation.time,operand, add=FALSE) %>%
summarise(RTdiff=mean(meanRT[congruent=="no"]-meanRT[congruent=="yes"],na.rm=TRUE)) %>% group_by(presentation.time,operand,add=FALSE) %>%
summarise(meanDiff= mean(RTdiff), sem = sem(RTdiff))
Now plot this summary, giving more or less the bar plot that Sklar et al. gave (though I would keep operation as a variable here. Make sure you get some error bars on there (e.g. geom_errorbar or geom_linerange).
ave %>% ggplot(aes(x=presentation.time,y=meanDiff)) +
geom_bar(stat="identity",position="dodge",aes(fill=operand)) +
geom_linerange(aes(ymin=meanDiff-sem,ymax=meanDiff+sem)) +
facet_grid(.~ operand) +
ggthemes::theme_few()
What do you see here? How close is it to what Sklar et al. report? Do the error bars match? How do you interpret these data?
Challenge problem: verify Sklar et al.’s claim about the relationship between RT and the objective manipulation check.
sum <- d %>%
group_by(objective.test,operand,presentation.time,congruent,subid) %>%
summarise(rt = mean(RT,na.rm=TRUE)) %>%
group_by(objective.test,operand,presentation.time,subid,add=FALSE)%>%
summarise(rtdiff = mean(rt[congruent=="no"]-rt[congruent=="yes"],na.rm=TRUE))
ggplot(sum, aes(x=objective.test,y=rtdiff,color=presentation.time)) +
geom_point() +
facet_grid(.~ operand) +
geom_smooth(method="lm") +
ggthemes::theme_few()
Show us what you would do with these data, operating from first principles. What’s the fairest plot showing a test of Sklar et al.’s original hypothesis??????????
sss <- d %>%
group_by(congruent,operand,subid,presentation.time) %>%
summarise(rt = mean(RT,na.rm=TRUE))
sss %>% ggplot(aes(x=presentation.time,y=rt,color=congruent)) +
geom_boxplot() +
geom_jitter(width=.4) +
facet_grid(.~ operand)+
ggthemes::theme_few()
sss <- d %>%
group_by(congruent,operand,subid,presentation.time) %>%
summarise(rt = mean(RT,na.rm=TRUE))%>%
group_by(congruent,operand,presentation.time,add=FALSE)%>%
summarise(meanrt = mean(rt), sem = sem(rt), ci = ci95(rt))
sss %>% ggplot(aes(x=presentation.time,y=meanrt,fill=congruent)) +
geom_bar(stat="identity",position="dodge") +
geom_linerange(aes(ymin=meanrt-sem,ymax=meanrt+sem),position=position_dodge(.9)) +
facet_grid(.~ operand) +
ggthemes::theme_few()
############
sum2 <- d %>%
group_by(subjective.test,operand,presentation.time,congruent,subid) %>%
summarise(rt = mean(RT,na.rm=TRUE)) %>%
group_by(subjective.test,operand,presentation.time,subid,add=FALSE)%>%
summarise(rtdiff = mean(rt[congruent=="no"]-rt[congruent=="yes"],na.rm=TRUE))
ggplot(sum2, aes(as.numeric(subjective.test),rtdiff,color=presentation.time)) +
geom_point() +
facet_grid(.~ operand) +
geom_smooth(method="lm") +
ggthemes::theme_few()
Challenge problem: Do you find any statistical support for Sklar et al.’s findings?
summary(lmer(RT ~ congruent*presentation.time*operand +(congruent|subid), data=d))
## Linear mixed model fit by REML ['lmerMod']
## Formula:
## RT ~ congruent * presentation.time * operand + (congruent | subid)
## Data: d
##
## REML criterion at convergence: 76888.9
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -5.0771 -0.6144 -0.0978 0.5159 14.1845
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## subid (Intercept) 5213.330 72.203
## congruentyes 1.746 1.321 -1.00
## Residual 13134.147 114.604
## Number of obs: 6231, groups: subid, 42
##
## Fixed effects:
## Estimate
## (Intercept) 701.881801
## congruentyes 7.945612
## presentation.time2000 -49.975343
## operandsubtraction -11.461118
## congruentyes:presentation.time2000 -8.801151
## congruentyes:operandsubtraction -14.326757
## presentation.time2000:operandsubtraction 0.007592
## congruentyes:presentation.time2000:operandsubtraction 0.405341
## Std. Error t value
## (Intercept) 16.667266 42.11
## congruentyes 5.848933 1.36
## presentation.time2000 23.024223 -2.17
## operandsubtraction 5.958232 -1.92
## congruentyes:presentation.time2000 8.074887 -1.09
## congruentyes:operandsubtraction 8.430542 -1.70
## presentation.time2000:operandsubtraction 8.222365 0.00
## congruentyes:presentation.time2000:operandsubtraction 11.639271 0.03
##
## Correlation of Fixed Effects:
## (Intr) cngrnt pr.2000 oprnds cn:.2000 cngrn: p.2000:
## congruentys -0.225
## prsntt.2000 -0.724 0.163
## oprndsbtrct -0.172 0.491 0.125
## cngrn:.2000 0.163 -0.724 -0.224 -0.356
## cngrntys:pr 0.122 -0.692 -0.088 -0.707 0.501
## prsnt.2000: 0.125 -0.356 -0.172 -0.725 0.489 0.512
## cngr:.2000: -0.088 0.501 0.121 0.512 -0.692 -0.724 -0.706