Intro

library(tidyverse)

## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr

## Conflicts with tidy packages ----------------------------------------------

## filter(): dplyr, stats
## lag():    dplyr, stats

library(ggthemes)
theme_set(theme_few())
sem <- function(x) {sd(x, na.rm=TRUE) / sqrt(sum(!is.na((x))))}
ci <- function(x) {sem(x) * 1.96} # reasonable approximation

This is problem set #4, in which we hope you will practice the visualization package ggplot2, as well as hone your knowledge of the packages tidyr and dplyr. You’ll look at two different datasets here.

First, data on children’s looking at social targets from Frank, Vul, Saxe (2011, Infancy).

Second, data from Sklar et al. (2012) on the unconscious processing of arithmetic stimuli.

In both of these cases, the goal is to poke around the data and make some plots to reveal the structure of the dataset.

Part 1

This part is a warmup, it should be relatively straightforward ggplot2 practice.

Load data from Frank, Vul, Saxe (2011, Infancy), a study in which we measured infants’ looking to hands in moving scenes. There were infants from 3 months all the way to about two years, and there were two movie conditions (Faces_Medium, in which kids played on a white background, and Faces_Plus, in which the backgrounds were more complex and the people in the videos were both kids and adults). An eye-tracker measured children’s attention to faces. This version of the dataset only gives two conditions and only shows the amount of looking at hands (other variables were measured as well).

fvs <- read_csv("data/FVS2011-hands.csv")

## Parsed with column specification:
## cols(
##   subid = col_integer(),
##   age = col_double(),
##   condition = col_character(),
##   hand.look = col_double()
## )

First, use ggplot to plot a histogram of the ages of children in the study. NOTE: this is a repeated measures design, so you can’t just take a histogram of every measurement.

fvs %>% 
  group_by(subid) %>%
  ggplot(aes(x=age)) +
  geom_histogram(binwidth=.5, col = "black", fill="purple") +
  labs(title = "Histogram of participant ages",
       x = "Age (in months)", y = "Number of children") +
  theme(plot.title = element_text(hjust = 0.5))

Second, make a scatter plot showing the difference in hand looking by age and condition. Add appropriate smoothing lines. Take the time to fix the axis labels and make the plot look nice.

fvs %>% 
  ggplot(aes(x=age, y = hand.look, col = condition)) +
  geom_point(alpha = .5) +
  geom_smooth(method="lm") +
  labs(title = "Infant hand looking",
       x = "Age (in months)", y = "Hand looking", fill = "Condition") +
  scale_color_manual(values=c("red", "blue")) +
  theme_classic() +
  theme(plot.title = element_text(hjust = 0.5))

What do you conclude from this pattern of data?

Generally, it looks like the proportion of hand looking increases as the age of the child participant increases. Additionally, it appears that there is also an interaction between condition and age, with the difference between hand looking in the two conditions also increasing as age increases (i.e. the slope for Faces_Plus seems steeper than that for Faces_Medium). In younger children the difference between conditions is negligible (potentially due to a floor effect); however, the two conditions start to differentiate as the age of participant increases with older children looking more at hands in the Faces Plus condition than the Faces Medium condition.

What statistical analyses would you perform here to quantify these differences?

I would run a linear mixed effects model with age and condition as fixed effects and hand looking as the dependent variable. Because each child is measured twice (in the Faces_Medium and Faces_Plus condition), I would also include a random effect for subject.

Part 2

Sklar et al. (2012) claim evidence for unconscious arithmetic processing - they prime participants with arithmetic problems and claim that the authors are faster to repeat the answers. We’re going to do a reanalysis of their Experiment 6, which is the primary piece of evidence for that claim. The dataare generously shared by Asael Sklar. (You may recall these data from the tidyverse tutorial earlier in the quarter).

Data Prep

First read in two data files and subject info. A and B refer to different trial order counterbalances.

subinfo <- read_csv("data/sklar_expt6_subinfo_corrected.csv")

## Parsed with column specification:
## cols(
##   subid = col_integer(),
##   presentation.time = col_integer(),
##   subjective.test = col_integer(),
##   objective.test = col_double()
## )

d_a <- read_csv("data/sklar_expt6a_corrected.csv")

## Parsed with column specification:
## cols(
##   .default = col_integer(),
##   prime = col_character(),
##   congruent = col_character(),
##   operand = col_character()
## )

## See spec(...) for full column specifications.

d_b <- read_csv("data/sklar_expt6b_corrected.csv")

## Parsed with column specification:
## cols(
##   .default = col_integer(),
##   prime = col_character(),
##   congruent = col_character(),
##   operand = col_character()
## )
## See spec(...) for full column specifications.

gather these datasets into long (“tidy data”) form. If you need to review tidying, here’s the link to R4DS (bookmark it!). Remember that you can use select_helpers to help in your gathering.

Once you’ve tidied, bind all the data together. Check out bind_rows.

The resulting tidy dataset should look like this:

    prime prime.result target congruent operand distance counterbalance subid    rt
    <chr>        <int>  <int>     <chr>   <chr>    <int>          <int> <dbl> <int>
 1 =1+2+5            8      9        no       A       -1              1     1   597
 2 =1+3+5            9     11        no       A       -2              1     1   699
 3 =1+4+3            8     12        no       A       -4              1     1   700
 4 =1+6+3           10     12        no       A       -2              1     1   628
 5 =1+9+2           12     11        no       A        1              1     1   768
 6 =1+9+3           13     12        no       A        1              1     1   595

d_a.tidy <- d_a %>%
  gather(subid, rt, 8:28)

d_b.tidy <- d_b %>%
  gather(subid, rt, 8:28)

d.tidy <- bind_rows(d_a.tidy, d_b.tidy)

head(d.tidy)

## # A tibble: 6 x 9
##    prime prime.result target congruent operand distance counterbalance
##    <chr>        <int>  <int>     <chr>   <chr>    <int>          <int>
## 1 =1+2+5            8      9        no       A       -1              1
## 2 =1+3+5            9     11        no       A       -2              1
## 3 =1+4+3            8     12        no       A       -4              1
## 4 =1+6+3           10     12        no       A       -2              1
## 5 =1+9+2           12     11        no       A        1              1
## 6 =1+9+3           13     12        no       A        1              1
## # ... with 2 more variables: subid <chr>, rt <int>

Merge these with subject info. You will need to look into merge and its relatives, left_ and right_join. Call this dataframe d, by convention.

d.tidy$subid <- as.numeric(d.tidy$subid) # recast to allow for join
d <- right_join(d.tidy, subinfo, by = "subid")

Clean up the factor structure (just to make life easier). No need to, but if you want, you can make this more tidyverse-ish.

d$presentation.time <- factor(d$presentation.time)
levels(d$operand) <- c("Addition","Subtraction")

Data Analysis Preliminaries

Examine the basic properties of the dataset. First, show a histogram of reaction times.

d %>% 
  group_by(subid) %>%
  ggplot(aes(x=rt)) +
  geom_histogram(binwidth=1, col = "blue", fill="blue")

## Warning: Removed 237 rows containing non-finite values (stat_bin).

d %>% 
  group_by(subid) %>%
  ggplot(aes(x=rt)) +
  geom_histogram(binwidth=1, col = "blue", fill="blue") +
  scale_x_continuous(limits = c(550, 600), breaks= seq(550,600,2))

## Warning: Removed 5073 rows containing non-finite values (stat_bin).

Challenge question: what is the sample rate of the input device they are using to gather RTs?

I’d say the sampling rate is around 36 kHz

Sklar et al. did two manipulation checks. Subjective - asking participants whether they saw the primes - and objective - asking them to report the parity of the primes (even or odd) to find out if they could actually read the primes when they tried. Examine both the unconscious and conscious manipulation checks. What do you see? Are they related to one another?

d %>% 
  group_by(subid) %>%
  ggplot(aes(x=subjective.test, y = objective.test)) +
  geom_point(alpha = .5) +
  geom_smooth(method="lm")

The two appear to be related in that those who claimed that they saw the primes (subjective test) also showed higher accuracy when asked to report the parity of the primes (objective test) and vice versa.

OK, let’s turn back to the measure and implement Sklar et al.’s exclusion criterion. You need to have said you couldn’t see (subjective test) and also be not significantly above chance on the objective test (< .6 correct). Call your new data frame ds.

ds <- d %>%
  filter(subjective.test == 0 & objective.test < 0.6)

Replicating Sklar et al.’s analysis

Sklar et al. show a plot of a “facilitation effect” - the amount faster you are for prime-congruent naming compared with prime-incongruent naming. They then show plot this difference score for the subtraction condition and for the two prime times they tested. Try to reproduce this analysis.

HINT: first take averages within subjects, then compute your error bars across participants, using the ci function (defined above). Sklar et al. use SEM (and do it incorectly, actually), but CI is more useful for “inference by eye” as discussed in class.

HINT 2: remember that in class, we reviewed the common need to group_by and summarise twice, the first time to get means for each subject, the second time to compute statistics across subjects.

HINT 3: The final summary dataset should have 4 rows and 5 columns (2 columns for the two conditions and 3 columns for the outcome: reaction time, ci, and n).

# Reproduce Sklar plot (with subtraction condition only)
facilitation.sklar <- ds %>%
  filter(operand == "S") %>%
  group_by(congruent,presentation.time,subid) %>%
  summarise(avg_rt = mean(rt, na.rm=T)) %>%
  spread(congruent, avg_rt) %>% 
  mutate(difference = no-yes) %>%
  group_by(presentation.time) %>%
  summarise(effect = mean(difference), c_intervals = ci(difference), n = length(unique(subid)))

# Include both addition and subtraction conditions
facilitation.full <- ds %>%
  group_by(congruent,operand,presentation.time,subid) %>%
  summarise(avg_rt = mean(rt, na.rm=T)) %>%
  spread(congruent, avg_rt) %>% 
  mutate(difference = no-yes) %>%
  group_by(operand,presentation.time) %>%
  summarise(effect = mean(difference), c_intervals = ci(difference), n = length(unique(subid)))

# Combine the two conditions
facilitation.combined <- ds %>%
  group_by(congruent,presentation.time,subid) %>%
  summarise(avg_rt = mean(rt, na.rm=T)) %>%
  spread(congruent, avg_rt) %>% 
  mutate(difference = no-yes) %>%
  group_by(presentation.time) %>%
  summarise(effect = mean(difference), c_intervals = ci(difference), n = length(unique(subid)))

Now plot this summary, giving more or less the bar plot that Sklar et al. gave (though I would keep operation as a variable here. Make sure you get some error bars on there (e.g. geom_errorbar or geom_linerange).

# Sklar plot - to match plot from paper
facilitation.sklar %>% 
  ggplot(aes(x=presentation.time, y = effect)) +
  geom_bar(stat="identity", width=.5, fill = "blue") +
  geom_errorbar(aes(ymin=effect-c_intervals, ymax=effect+c_intervals), width=.1) +
  labs(title = "Facilitation effect for subtraction",
       x = "Presentation duration", y = "Facilitation effect (ms)") +
  theme_classic() +
  theme(plot.title = element_text(hjust = 0.5))

# Full plot
facilitation.full %>% 
  ggplot(aes(x=presentation.time, y = effect, fill = operand)) +
  geom_bar(position = "dodge", stat="identity")+
  geom_errorbar(aes(ymin=effect-c_intervals, ymax=effect+c_intervals),
                position = position_dodge(width = .9), width = .2) +
  labs(title = "Overall facilitation effects",
       x = "Presentation duration", y = "Facilitation effect (ms)", fill = "Operation") +
  scale_fill_brewer(palette = 'Set1') + # get rid of salmon
  theme_classic() +
  theme(plot.title = element_text(hjust = 0.5))

# Plot combining conditions
facilitation.combined %>% 
  ggplot(aes(x=presentation.time, y = effect)) +
  geom_bar(stat="identity", width=.5, fill = "purple") +
  geom_errorbar(aes(ymin=effect-c_intervals, ymax=effect+c_intervals), width=.1) +
  labs(title = "Facilitation effect for both operations combined",
       x = "Presentation duration", y = "Facilitation effect (ms)") +
  theme_classic() +
  theme(plot.title = element_text(hjust = 0.5))

What do you see here? How close is it to what Sklar et al. report? How do you interpret these data?

The subtraction effect looks similar to what Sklar et al. report (although their error bars are somewhat misleading). The values look the same as Sklar et al. reported and based on the confidence intervals not overlapping with zero, I conclude that the effect is significant, as reported in the paper. However, if you look at the addition condition, there appears to be no facilitation effect. Sklar et al. report this in the paper itself but do not plot it. Similarly, if you combine across conditions, the faciliation effect disappears at the 1700 ms presentation duration and only just remains for the 2000 ms presentation duration. As an aside, while I do understand their exclusion criteria, it leaves 8 or 9 subjects per condition, which is a very small N.

PS#4: Exploratory Data Analysis

Dawn Finzi