Intro

This is problem set #2, in which we hope you will practice the visualization package ggplot2, as well as hone your knowledge of the packages tidyr and dplyr.

Sklar et al. (2012) claims evidence for unconscious arithmetic processing. We’re going to do a reanalysis of their Experiment 6, which is the primary piece of evidence for that claim. The data are generously contributed by Asael Sklar.

First let’s set up a few preliminaries.

Data Prep

First read in two data files and subject info. A and B refer to different trial order counterbalances.

Gather these datasets into long form and get rid of the Xs in the headers.

d.tidya = d.a %>%
    rename(X_1 = X1,
         X_2 = X2,
         X_3 = X3,
         X_4 = X4,
         X_5 = X5,
         X_6 = X6,
         X_7 = X7,
         X_8 = X8,
         X_9 = X9,
         X_10 = X10,
         X_11 = X11,
         X_12 = X12,
         X_13 = X13,
         X_14 = X14,
         X_15 = X15,
         X_16 = X16,
         X_17 = X17,
         X_18 = X18,
         X_19 = X19,
         X_20 = X20,
         X_21 = X21)%>%
    gather(PID, RT, X_1, X_2, X_3, X_4, X_5, X_6, X_7, X_8, X_9, X_10, X_11, X_12, X_13, X_14, X_15, X_16, X_17, X_18, X_19, X_20, X_21)%>%
    separate(PID, c("X", "subid"))

d.tidyb = d.b %>%
    rename(X_22 = X22,
         X_23 = X23,
         X_24 = X24,
         X_25 = X25,
         X_26 = X26,
         X_27 = X27,
         X_28 = X28,
         X_29 = X29,
         X_30 = X30,
         X_31 = X31,
         X_32 = X32,
         X_33 = X33,
         X_34 = X34,
         X_35 = X35,
         X_36 = X36,
         X_37 = X37,
         X_38 = X38,
         X_39 = X39,
         X_40 = X40,
         X_41 = X41,
         X_42 = X42)%>%
    gather(PID, RT, X_22, X_23, X_24, X_25, X_26, X_27, X_28, X_29, X_30, X_31, X_32, X_33, X_34, X_35, X_36, X_37, X_38, X_39, X_40, X_41, X_42)%>%
    separate(PID, c("X", "subid"))

Bind these together. Check out bind_rows.

d.tidy = bind_rows(d.tidya, d.tidyb) %>% select(prime,  prime.result,   target, congruent,  operand,    distance,   counterbalance, subid,  RT)

Merge these with subject info. You will need to look into merge and its relatives, left_ and right_join. Call this dataframe d, by convention.

d = merge(d.tidy, subinfo, by = "subid", all.x=T)

Clean up the factor structure.

d$presentation.time <- factor(d$presentation.time)
levels(d$operand) <- c("addition","subtraction")

Data Analysis Preliminaries

Examine the basic properties of the dataset. First, take a histogram.

qplot(RT, data=d)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 237 rows containing non-finite values (stat_bin).

qplot(RT, data = d,
      facets = presentation.time ~ operand, 
      geom="histogram")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 237 rows containing non-finite values (stat_bin).

qplot(RT, data = d,
      facets = distance ~ operand, 
      geom="histogram")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 237 rows containing non-finite values (stat_bin).

Answer: Distributions look pretty normal/similar.

Challenge question: what is the sample rate of the input device they are using to gather RTs?

qplot(RT, binwidth = 1, xlim = c(500, 600), data=d)
## Warning: Removed 4674 rows containing non-finite values (stat_bin).
## Warning: Removed 2 rows containing missing values (geom_bar).

qplot(RT, binwidth = 1, xlim = c(700, 800), data=d)
## Warning: Removed 5216 rows containing non-finite values (stat_bin).
## Warning: Removed 2 rows containing missing values (geom_bar).

Looking at the data it seems to be about 30 ms. Let’s check it.

qplot(RT, binwidth = 30, xlim = c(700, 1000), data=d)
## Warning: Removed 4246 rows containing non-finite values (stat_bin).
## Warning: Removed 3 rows containing missing values (geom_bar).

qplot(RT, binwidth = 40, xlim = c(700, 1000), data=d)
## Warning: Removed 4246 rows containing non-finite values (stat_bin).
## Warning: Removed 2 rows containing missing values (geom_bar).

qplot(RT, binwidth = 20, xlim = c(700, 1000), data=d)
## Warning: Removed 4246 rows containing non-finite values (stat_bin).
## Warning: Removed 2 rows containing missing values (geom_bar).

Yeah, I would guess 30 ms.

Sklar et al. did two manipulation checks. Subjective - asking participants whether they saw the primes - and objective - asking them to report the parity of the primes (even or odd) to find out if they could actually read the primes when they tried. Examine both the unconscious and conscious manipulation checks. What do you see? Are they related to one another?

qplot(objective.test, data=d) #50% would suggest chance levels.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

qplot(subjective.test, data=d) #looks like around half said they could
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

qplot(objective.test, fill= presentation.time, data=d) #more people could see it when there was more time
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

qplot(subjective.test, fill= presentation.time, data=d) #presentation time didn't matter.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(d, aes(x = objective.test, y = subjective.test))+
  geom_point() +
  stat_smooth(method="glm", method.args=list(family="binomial"), se=FALSE)

Answer: It looks like there is a relationship between the two. If you peformed well on the objective test, you were more likley to report seeing the primes.

OK, let’s turn back to the measure and implement Sklar et al.’s exclusion criterion. You need to have said you couldn’t see (subjective test) and also be not significantly above chance on the objective test (< .6 correct). Call your new data frame ds.

ds = d[d$subjective.test == 0 & d$objective.test < .6,]

Sklar et al.’s analysis

Sklar et al. show a plot of a “facilitation effect” - the amount faster you are for prime-congruent naming compared with prime-incongruent naming. They then show plot this difference score for the subtraction condition and for the two prime times they tested. Try to reproduce this analysis.

HINT: first take averages within subjects, then compute your error bars across participants, using the sem function (defined above).

d.graph = ds %>% 
  group_by(congruent, presentation.time, operand, subid) %>%
  summarise(RT = mean(RT, na.rm=T))%>%
  spread(congruent, RT)%>%
  mutate(diff = no-yes)%>%
  group_by(operand, presentation.time) %>%
  summarise(mean = mean(diff, na.rm=T),
            sem = sem(diff))

Now plot this summary, giving more or less the bar plot that Sklar et al. gave (though I would keep operation as a variable here. Make sure you get some error bars on there (e.g. geom_errorbar or geom_linerange).

ggplot(d.graph, aes(x = presentation.time, y = mean)) + 
    geom_bar(position=position_dodge(), stat="identity") +
    geom_errorbar(aes(ymin=mean-sem, ymax=mean+sem),
                  width=.2,                    
                  position=position_dodge(.9)) +
    facet_wrap(~ operand)

What do you see here? How close is it to what Sklar et al. report? Do the error bars match? How do you interpret these data? answer: The means in Sklar et al. look identical but the error bars are at around half of the ones that I computed. To me, it looks like the differences are not significantly diffrent.

Your own analysis

Show us what you would do with these data, operating from first principles. What’s the fairest plot showing a test of Sklar et al.’s original hypothesis?

d.mel = ds %>% 
  group_by(congruent, presentation.time, operand, subid) %>%
  summarise(RT = mean(RT, na.rm=T))%>%
  group_by(operand, presentation.time, congruent) %>%
  summarise(mean = mean(RT, na.rm=T),
            ci95 = ci95(RT))
d.mels = d.mel[d.mel$operand=="subtraction",]
d.mela = d.mel[d.mel$operand=="addition",]

ggplot(d.mel, aes(x = congruent, y = mean, fill = congruent)) + 
    geom_bar(position=position_dodge(), stat="identity") +
    geom_errorbar(aes(ymin=mean-ci95, ymax=mean+ci95),
                  width=.2,                    
                  position=position_dodge(.9)) +
    facet_wrap(operand ~ presentation.time) + 
    ggtitle("RT in Subtracion and Addition Trials Based on\nPresentation Time and Congruence of Primes") +
    theme(plot.title = element_text(lineheight=.8, face="bold", size= 14))

I think that is is more fair to look at the raw scores for congurent and incongruent primes, rather than difference scores. It seems like for both presentation times (1700 ms and 2000 ms), the difference between congruent and incorgruent primes is not close to significance (judging by the confidence intervals). While this doesn’t completely invalidate the results, it does make the previous graph look a bit misleading.

In general, difference scores “hide” relevant data. The difference scores can be different from each other, but that doesn’t mean either difference is important.

Also, I would include addition in there as well. But I understand why they didn’t, due to space.