In their 2016 Psychological Science paper, Zhao, Bülthoff and Bülthoff showed that nonface objects, specifically line patterns with salient Gestalt information, can elicit holistic processing, even in the absence of expertise. In this paper, holistic processing was measured by composite tasks. In a composite task, participants are shown two images (separated by a mask). Participants are then asked whether the top half of the second image is the same as, or different from, the first image they were shown. The effect produced by holistic processing is an increased difficulty in identifying as the same top halves of two sequentially presented stimuli when aligned with different bottom halves, than when misaligned with said bottom halves. This effect has traditionally been shown with faces.
Here, the dependent variable was response sensitivity (d’) and a significant interaction between congruency and alignment in the line composite task (driven by a much larger congruency effect in the aligned conditions than in the misaligned conditions) was taken as evidence of holistic processing for the line pattern stimuli. I aim to replicate Experiment 1a of this paper, where participants are presented with a line composite task and a face composite task in a single session. In that experiment, the authors found evidence for holistic processing of both stimuli types (line and face).
The effect I am attempting to replicate is the interaction between congruency and alignment in the line pattern task. This was a 2 (congruency) x 2 (alignment) repeated measures ANOVA and the effect size of the interaction (ηp2) was 0.52. The original experiment had 22 participants, giving 99.9% a posteriori power with an alpha level of 0.05. A post-hoc power analysis given this effect size and alpha level determines that I would need 8 participants for 80% power, 8 participants for 90% power, and 10 participants for 95% power. Power calculations were run in G*power with the ‘Options’ changed so that the effect size specification was set ‘as in SPSS’.
Given the large effect size and relatively small original sample size, I planned to use the same sample size as the original authors (N = 22). However, for reasons of cost, I then reduced the planned sample size to 15. I plan to use an overall > 55% accuracy cutoff for excluding participants. This cutoff is based on previous work testing the composite effect on Mechanical Turk (Susilo, Rezlescu & Duchaine 2013).
The materials in the original article were as follows:
“Composite faces. Face stimuli were created from 20 Caucasian faces (10 males, 10 females; gray-scale images) in the face database of the Max Planck Institute for Biological Cybernetics (Blanz & Vetter, 1999; Troje & Bülthoff, 1996). Twenty face pairs were formed by using each face as a target face once and as a gender-matched paired face once. Each pair was a unique combination of two faces (i.e., any two pairs differed in at least one face). Each face image was cut into a top part and a bottom part (each 270 × 135 pixels). Within the pairs, tops and bottoms were combined to create composite faces (Fig. 1a). For each of the 20 pairs of faces, 8 pairs of composite faces were created following the design illustrated in Figure 1c. Thus, there were 160 pairs of composite faces in total. A 1-pixel black line was added to each composite face to clearly separate the top and bottom parts. Stimuli used for practice trials were created using the same method with additional faces from the database.”
“Composite patterns. Twenty pairs of line patterns were created; within each pair, one pattern served as the target (Fig. 1b). Each line-pattern stimulus was cut into a top part and a bottom part (each 270 × 135 pixels). Within each pair of line patterns, both the two top parts and the two bottom parts differed from each other, but they could be swapped without disrupting the Gestalt information connecting the top and bottom parts (i.e., connected- ness, closure, and continuity between lines). Aligning the top part of one line pattern with the bottom part of the paired line pattern formed a new line pattern, changing the appearance of the top part (i.e., emergent features were exhibited; Fig. 1b). The composite-pattern stimuli were created using the same method as for the faces. For each of the 20 pairs of line patterns, we created 8 pairs of composite patterns following the design illustrated in Figure 1c, so there were 160 pairs of test stimuli in total. Stimuli used for practice trials were created the same way with additional pairs of line patterns (see Fig. S1 in the Supplemental Material for all line-pattern stimuli).”
For the replication I obtained the line patterns used from the authors, so those stimuli were identical to the original study. However, due to copyright issues, I was not able to obtain the original face stimuli. Instead, I used the face stimuli available from the Rossion lab (Rossion, 2013). However, there were only enough stimuli for 10 pairs of face pairs, so each pair was repeated to create the original number of stimuli.
The original procedure was as follows:
“Participants performed two composite tasks, one with faces and one with line patterns (order counter- balanced across participants). On each trial, we presented 1 of the 160 pairs of faces or line patterns sequentially with an intervening mask (Figs. 1a and 1b). The target face or line pattern was always presented as the first stimulus. Participants made a same/different judgment about the top parts of the two faces or line patterns. Trials were presented in random order in each task, with an intertrial interval of 1 s (blank screen). Participants were instructed to attend to the top parts only and to ignore the bottom parts. For each task, participants completed eight practice trials before the experimental trials.”
“We used a complete design of the composite task to measure holistic processing (Richler, Cheung, & Gauthier, 2011). With this design (Fig. 1c), the first stimulus in a trial (i.e., the target face or line pattern) was always aligned. The second was either aligned (aligned condition) or misaligned (misaligned condition). For misaligned faces and line patterns, we shifted the top part to the right and the bottom part to the left by 33 pixels each. The top parts of the two stimuli (i.e., targets) in a trial were either the same (same condition) or different from each other (different condition). Finally, the irrelevant bottom parts were also manipulated. In the congruent condition, the bottom parts were the same in the same condition and were different in the different condition. In the incongruent condition, they were different in the same condition and were the same in the different condition. This design yielded 160 trials per stimulus type (2 alignment conditions × 2 congruency conditions × 2 same/different conditions × 20 exemplars of target stimuli).”
The replication procedure is the same with two exceptions: 1) The order of the tasks is not counterbalanced across participants. The line task is always presented first in order to avoid biasing participants towards using holistic processing techniques as they would with faces. 2) There are only four practice trials for the face task. This change was made because we have so few face stimuli in comparison to the original experiment.
The online experiment can be accessed here.
Due to a technical issue with the registration process, the official preregistration was not completed, but a github commit of the analysis script prior to data collection (last pre-data collection commit) serves as the preregistration.
Data from 15 participants were collected through Mechanical Turk. One participant was excluded as their overall accuracy was less than 55%, leaving a sample size of 14.
Minor changes were made to the confirmatory analysis to clean up the appearance of tables and figures.
Data preparation following the analysis plan.
####Import data
path <- "../Zhao2016/"
files <- dir(paste0(path,"anonymized-results/"),
pattern = "*.json")
d.raw <- data.frame()
for (f in files) {
jf <- paste0(path, "anonymized-results/",f)
jd <- jsonlite::fromJSON(paste(readLines(jf)), flatten=TRUE)
worker_id <- jd$WorkerId
trial_id <- jd$answers$data$trial_id
trial_index <- jd$answer$data$trial_index
correct <- jd$answer$data$correct
alignment <- jd$answer$data$alignment
congruency <- jd$answer$data$congruency
response <- jd$answer$data$response
rt <- jd$answer$data$rt
id <- cbind(trial_id,trial_index,correct,alignment,congruency,response,rt)
sub_data <- data.frame(id, worker_id)
d.raw <- rbind(d.raw,sub_data)
}
#### Data exclusion / filtering
row.has.na <- apply(d.raw, 1, function(x){any(is.na(x))}) # get rid of non response data
d <- d.raw[!row.has.na,] %>%
mutate(trial_id = ifelse(trial_id == "response_line","line", "face")) %>%
mutate(correct = ifelse(correct == "TRUE", 1, 0)) %>%
mutate(answer = ifelse((response=="same"&correct==1)|
(response=="different"&correct==0),"same","different"))
# Exclude subjects with less than 55% accuracy overall
d <- d %>%
group_by(worker_id) %>%
mutate(avg = mean(correct)) %>%
filter(avg > .55)
# Compile sample size and exclusion values and print
total_n <- length(unique(d.raw$worker_id))
filtered_n <- length(unique(d$worker_id))
num_excluded <- total_n - filtered_n
sample_info <- data.frame(c(total_n, num_excluded, filtered_n))
rownames(sample_info) <- c("Total participants", "Excluded participants", "Final sample")
colnames(sample_info) <- "N"
kable(sample_info) %>%
kable_styling(bootstrap_options = "striped", full_width = F)
| N | |
|---|---|
| Total participants | 15 |
| Excluded participants | 1 |
| Final sample | 14 |
# recast variables (imported as lists)
d$rt <- as.numeric(d$rt)
d$trial_index <- as.numeric(d$trial_index)
d$trial_id <- as.factor(d$trial_id)
d$alignment <- as.character(d$alignment)
d$congruency <- as.character(d$congruency)
d$response <- as.character(d$response)
# view to make sure everything is normal
kable(head(d))
| trial_id | trial_index | correct | alignment | congruency | response | rt | worker_id | answer | avg |
|---|---|---|---|---|---|---|---|---|---|
| line | 46 | 0 | misaligned | congruent | same | 188 | anon0 | different | 0.696875 |
| line | 50 | 1 | misaligned | congruent | different | 257 | anon0 | different | 0.696875 |
| line | 54 | 1 | misaligned | congruent | different | 275 | anon0 | different | 0.696875 |
| line | 58 | 1 | misaligned | congruent | same | 185 | anon0 | same | 0.696875 |
| line | 62 | 0 | misaligned | incongruent | different | 139 | anon0 | same | 0.696875 |
| line | 66 | 0 | misaligned | incongruent | same | 307 | anon0 | different | 0.696875 |
d %>% ggplot(aes(x=rt)) +
geom_histogram(binwidth=10, col = "black") +
labs(title = "Histogram of reaction time (all subjects collapsed)",
x = "Reaction time", y = "Count") +
theme(plot.title = element_text(hjust = 0.5)) +
xlim(0,1500)
In order to run statistic analyses on the data, I am first calculating the hit rates and false alarms by condition and task. Then I will calculate d prime by subtracting the z-transform of the false alarm rate from the z-transform of the hit rate. To avoid impossible numbers, any false alarm rates of 0 will be replaced by 1/(2 x the max number of false alarms) and any hit rates of 1 will be replaced by 1-1/(2 x the max number of hits).
#### Prepare data for analysis
d.results <- d %>%
group_by(worker_id,trial_id,alignment,congruency) %>%
summarize(rt = mean(rt),
avg = mean(correct),
false_alarm = sum(response=="same"&answer=="different")/sum(response=="same"),
hit_rate = sum(response=="different"&answer=="different")/sum(response=="different"),
max_fa = sum(response=="same"),
max_hr = sum(response=="different")) %>%
ungroup() %>%
mutate(false_alarm = ifelse(false_alarm == 0, 1/(2*max_fa), false_alarm)) %>%
mutate(hit_rate = ifelse(hit_rate == 1, 1-1/(2*max_hr), hit_rate)) %>%
group_by(worker_id,trial_id,alignment,congruency) %>%
mutate(d_prime = qnorm(hit_rate)-qnorm(false_alarm))
First, I want to see if there is an overall effect of congruency, and an interaction between congruency and alignment as in the Zhao et al. 2016. The original authors also showed no significant three-way interaction of alignment, congruency and task, which they state suggests that the line patterns are processed as holistically as the human faces. Here, I am running a 2 (task) x 2 (congruency) x 2 (alignment) repeated measures ANOVA to see if I find the same results
overall_anova <- ezANOVA(data = d.results,
dv = d_prime,
wid = worker_id,
within = .(trial_id,congruency,alignment),
detailed = T,
return_aov = T)
kable(overall_anova$ANOVA[1:7], digits = 3, padding=10, caption = "Overall ANOVA") %>%
kable_styling(full_width = F, position = "center", bootstrap_options = c("hover")) %>%
column_spec(1, bold = T, border_right = T)
| Effect | DFn | DFd | SSn | SSd | F | p |
|---|---|---|---|---|---|---|
| (Intercept) | 1 | 13 | 369.821 | 36.567 | 131.477 | 0.000 |
| trial_id | 1 | 13 | 10.189 | 17.460 | 7.586 | 0.016 |
| congruency | 1 | 13 | 20.112 | 11.991 | 21.804 | 0.000 |
| alignment | 1 | 13 | 0.072 | 3.096 | 0.302 | 0.592 |
| trial_id:congruency | 1 | 13 | 2.649 | 17.864 | 1.928 | 0.188 |
| trial_id:alignment | 1 | 13 | 0.146 | 2.261 | 0.841 | 0.376 |
| congruency:alignment | 1 | 13 | 2.474 | 2.080 | 15.466 | 0.002 |
| trial_id:congruency:alignment | 1 | 13 | 2.273 | 1.976 | 14.957 | 0.002 |
Second, I want to see if there is a significant effect of congruency and a significant interaction of congruency and alignment in the line task alone. The interaction is the key finding I am attempting to replicate. To test this, I am running a 2 (congruency) x 2 (alignment) repeated measures ANOVA for the line task.
d.line <- d.results %>%
filter(trial_id == "line")
line_anova <- ezANOVA(data = d.line,
dv = d_prime,
wid = worker_id,
within = .(congruency,alignment),
detailed = T,
return_aov = T)
kable(line_anova$ANOVA[1:7], digits = 3, padding=10, caption = "Line task ANOVA") %>%
kable_styling(full_width = F, position = "center", bootstrap_options = c("hover")) %>%
column_spec(1, bold = T, border_right = T)
| Effect | DFn | DFd | SSn | SSd | F | p |
|---|---|---|---|---|---|---|
| (Intercept) | 1 | 13 | 128.621 | 20.202 | 82.767 | 0.000 |
| congruency | 1 | 13 | 18.679 | 26.076 | 9.312 | 0.009 |
| alignment | 1 | 13 | 0.212 | 3.445 | 0.799 | 0.388 |
| congruency:alignment | 1 | 13 | 4.745 | 2.068 | 29.830 | 0.000 |
Furthermore, the original authors showed that the interaction was driven by a significant effect of congruency in aligned condition, but not the misaligned condition. To investigate this, I am running post hoc t tests.
num_tests_line = 2 #2 post hoc t tests (1 for aligned and 1 for misaligned)
# Test effect of congruency in aligned condition
d.line.aligned <- d.line %>%
filter(alignment == "aligned") %>%
dplyr::select(worker_id, congruency, d_prime)
x1 <- d.line.aligned %>%
filter(congruency == "congruent")
x2 <- d.line.aligned %>%
filter(congruency == "incongruent")
t.line.aligned <- t.test(x1$d_prime, x2$d_prime, paired = T)
t.line.aligned$p.value <- t.line.aligned$p.value*num_tests_line # correct p value for mult comp
kable(t(c(t.line.aligned$statistic, t.line.aligned$parameter, t.line.aligned$p.value)),
digits=3, caption = "Effect of congruency in the aligned condition",
col.names = c("t-statistic", "df", "p value")) %>%
kable_styling(full_width = F, position = "center")
| t-statistic | df | p value |
|---|---|---|
| 4.478 | 13 | 0.001 |
# Test effect of congruency in misaligned condition
d.line.misaligned <- d.line %>%
filter(alignment == "misaligned") %>%
dplyr::select(worker_id, congruency, d_prime)
x1 <- d.line.misaligned %>%
filter(congruency == "congruent")
x2 <- d.line.misaligned %>%
filter(congruency == "incongruent")
t.line.misaligned <- t.test(x1$d_prime, x2$d_prime, paired = T)
t.line.misaligned$p.value <- t.line.misaligned$p.value*num_tests_line # correct p value for mult comp
kable(t(c(t.line.misaligned$statistic, t.line.misaligned$parameter, t.line.misaligned$p.value)),
digits=3, caption = "Effect of congruency in the misaligned condition",
col.names = c("t-statistic", "df", "p value")) %>%
kable_styling(full_width = F, position = "center")
| t-statistic | df | p value |
|---|---|---|
| 1.438 | 13 | 0.348 |
Next, I am going to repeat these tests for the face task. The original authors also found a significant effect of congruency and a significant interaction of congruency and alignment in the face task. I am running a 2 (congruency) x 2 (alignment) repeated measures ANOVA for the face task.
d.face <- d.results %>%
filter(trial_id == "face")
face_anova <- ezANOVA(data = d.face,
dv = d_prime,
wid = worker_id,
within = .(congruency,alignment),
detailed = T,
return_aov = T)
kable(face_anova$ANOVA[1:7], digits = 3, padding=10, caption = "Face task ANOVA") %>%
kable_styling(full_width = F, position = "center", bootstrap_options = c("hover")) %>%
column_spec(1, bold = T, border_right = T)
| Effect | DFn | DFd | SSn | SSd | F | p |
|---|---|---|---|---|---|---|
| (Intercept) | 1 | 13 | 251.388 | 33.825 | 96.617 | 0.000 |
| congruency | 1 | 13 | 4.082 | 3.779 | 14.040 | 0.002 |
| alignment | 1 | 13 | 0.007 | 1.912 | 0.044 | 0.836 |
| congruency:alignment | 1 | 13 | 0.002 | 1.987 | 0.014 | 0.908 |
As with the line task, the original authors showed that the interaction was driven by a significant effect of congruency in aligned condition, but not the misaligned condition. To investigate this, I am again running post hoc t tests.
num_tests_face = 2 #2 post hoc t tests (1 for aligned and 1 for misaligned)
# test effect of congruency in aligned condition
d.face.aligned <- d.face %>%
filter(alignment == "aligned") %>%
dplyr::select(worker_id, congruency, d_prime)
x1 <- d.face.aligned %>%
filter(congruency == "congruent")
x2 <- d.face.aligned %>%
filter(congruency == "incongruent")
t.face.aligned <- t.test(x1$d_prime, x2$d_prime, paired = T)
t.face.aligned$p.value <- t.face.aligned$p.value*num_tests_face # correct p value for mult comp
kable(t(c(t.face.aligned$statistic, t.face.aligned$parameter, t.face.aligned$p.value)),
digits=3, caption = "Effect of congruency in the aligned condition",
col.names = c("t-statistic", "df", "p value")) %>%
kable_styling(full_width = F, position = "center")
| t-statistic | df | p value |
|---|---|---|
| 2.707 | 13 | 0.036 |
# test effect of congruency in misaligned condition
d.face.misaligned <- d.face %>%
filter(alignment == "misaligned") %>%
dplyr::select(worker_id, congruency, d_prime)
x1 <- d.face.misaligned %>%
filter(congruency == "congruent")
x2 <- d.face.misaligned %>%
filter(congruency == "incongruent")
t.face.misaligned <- t.test(x1$d_prime, x2$d_prime, paired = T)
t.face.misaligned$p.value <- t.face.misaligned$p.value*num_tests_face # correct p value for mult comp
kable(t(c(t.face.misaligned$statistic, t.face.misaligned$parameter, t.face.misaligned$p.value)),
digits=3, caption = "Effect of congruency in the misaligned condition",
col.names = c("t-statistic", "df", "p value")) %>%
kable_styling(full_width = F, position = "center")
| t-statistic | df | p value |
|---|---|---|
| 3.577 | 13 | 0.007 |
Here are the original results, as plotted in the paper. Exact values were not reported and had to be approximated from the original figure. Because of this, error bars are not included.
# use data frame structure of d.results
orig <- d.results[1:8,]
orig <- orig %>%
ungroup() %>%
select(trial_id,alignment,congruency,d_prime)
# replace d prime values with values from paper
orig$d_prime <- c(2.6,1.9,2.3,2,2.7,1.5,2,1.9)
# plot face and line data
task_names <- c("face" = "Face task", "line" = "Line task")
orig %>%
ggplot(aes(x= alignment, y = d_prime)) +
facet_grid(.~ trial_id, labeller = as_labeller(task_names)) +
geom_point(aes(col=congruency)) +
geom_line(aes(group = congruency,col=congruency)) +
labs(title = "Original plot") +
ylim(0,3) + ylab("Sensitivity (d')") + xlab("") +
scale_color_manual(values=c("red", "blue")) +
theme_few() +
theme(aspect.ratio=1) +
theme(plot.title = element_text(hjust = 0.5))
Here are the replication results, plotted in the same format. Error bars represent the standard error of the mean, within subjects.
# summarize for within subject error bars
d.plot <- summarySEwithin(d.results, measurevar="d_prime",
withinvars=c("trial_id","alignment","congruency"),
idvar="worker_id", na.rm=FALSE, conf.interval=.95)
task_names <- c("face" = "Face task", "line" = "Line task")
d.plot %>%
ggplot(aes(x= alignment, y = d_prime)) +
facet_grid(.~ trial_id, labeller = as_labeller(task_names)) +
geom_point(aes(col=congruency)) +
geom_errorbar(aes(ymin=d_prime-se, ymax=d_prime+se), width=.1) +
geom_line(aes(group = congruency,col=congruency)) +
labs(title = "Replication plot") +
ylim(0,3) + ylab("Sensitivity (d')") + xlab("") +
scale_color_manual(values=c("red", "blue")) +
theme_few() +
theme(aspect.ratio=1) +
theme(plot.title = element_text(hjust = 0.5))
Interestingly, this replication attempt revealed a significant composite effect in the line task (as measure by a significant interaction between congruency and alignment in that task) but no composite effect in the face task. The following exploratory analyses will be mainly focused on determining why this might be the case.
I am first summarizing accuracy and reaction time (RT) for the two tasks. Mean accuracy is calculated as the group mean of each individual’s mean accuracy across all conditions by task. Mean RT is instead calculated as the group mean of each individual’s median accuracy across all conditions by task, due to the heavy skew inherent in RT data.
As the face task was always presented second in my implementation, one possible explanation for the lack of a composite effect in this task may be participant fatigue. However, these simple summary statistics suggest that this is probably not the case, as participants were more accurate in the face task (83%) than in the line task (74%). This possibility will be investigated further in later analyses.
# Compile basic stats about worker performance after exclusion
d_basics <- d %>%
group_by(worker_id, trial_id) %>%
summarise(mean_acc = mean(correct), median_rt = median(rt, na.rm = T))
basics <- d_basics %>%
group_by(trial_id) %>%
summarise(accuracy = mean(mean_acc), RT = mean(median_rt),
acc_sd = sd(mean_acc), RT_sd = sd(median_rt))
basics <- basics[,2:5]
rownames(basics) <- c("Face task", "Line task")
kable(basics, digits = 2, col.names = c("Accuracy", "RT", "Accuracy", "RT")) %>%
kable_styling(bootstrap_options = c("hover"), full_width = F) %>%
add_header_above(c(" " = 1, "Means" = 2, "Standard deviation" = 2)) %>%
column_spec(1, bold = T, border_right = T)
| Accuracy | RT | Accuracy | RT | |
|---|---|---|---|---|
| Face task | 0.83 | 241.07 | 0.1 | 76.03 |
| Line task | 0.74 | 285.64 | 0.1 | 74.93 |
Effect sizes were not included in the pre-registered confirmatory analysis but are important to calculate. Here, I am calculating the partial eta squared for each effect in the overall ANOVA, the line task ANOVA and the face task ANOVA.
# Effect sizes
# Overall anova
unlisted <- unlist(overall_anova$ANOVA)
overall_aov_stats <- tibble(
effect = c("trial_id", "congruency", "alignment",
"trial_id:congruency", "trial_id:alignment", "congruency:alignment",
"trial_id:congruency:alignment"),
F = c(as.double(unlisted["F2"]), as.double(unlisted["F3"]), as.double(unlisted["F4"]),
as.double(unlisted["F5"]), as.double(unlisted["F6"]), as.double(unlisted["F7"]),
as.double(unlisted["F8"])),
p = c(as.double(unlisted["p2"]), as.double(unlisted["p3"]), as.double(unlisted["p4"]),
as.double(unlisted["p5"]), as.double(unlisted["p6"]), as.double(unlisted["p7"]),
as.double(unlisted["p8"])),
SSn = c(as.double(unlisted["SSn2"]), as.double(unlisted["SSn3"]),
as.double(unlisted["SSn4"]), as.double(unlisted["SSn5"]),
as.double(unlisted["SSn6"]), as.double(unlisted["SSn7"]),
as.double(unlisted["SSn8"])),
SSd = c(as.double(unlisted["SSd2"]), as.double(unlisted["SSd3"]),
as.double(unlisted["SSd4"]), as.double(unlisted["SSd5"]),
as.double(unlisted["SSd6"]), as.double(unlisted["SSd7"]),
as.double(unlisted["SSd8"]))) %>%
mutate(partial_eta_squared = SSn / (SSn + SSd)) %>%
select(-SSn, -SSd)
kable(overall_aov_stats, digits = 2, col.names = c("Effect", "F", "p", "Partial eta squared"),
caption = "Overall ANOVA effect sizes") %>%
kable_styling(full_width = F, position = "center", bootstrap_options = c("hover")) %>%
column_spec(1, bold = T, border_right = T)
| Effect | F | p | Partial eta squared |
|---|---|---|---|
| trial_id | 7.59 | 0.02 | 0.37 |
| congruency | 21.80 | 0.00 | 0.63 |
| alignment | 0.30 | 0.59 | 0.02 |
| trial_id:congruency | 1.93 | 0.19 | 0.13 |
| trial_id:alignment | 0.84 | 0.38 | 0.06 |
| congruency:alignment | 15.47 | 0.00 | 0.54 |
| trial_id:congruency:alignment | 14.96 | 0.00 | 0.54 |
Here we see that the largest effect is the main effect of congruency (ηp2 = 0.63), although the congruency by alignment interaction, and the congruency by alignment by task interaction are also very large effects (ηp2 = 0.54 for both).
# Line anova
unlisted <- unlist(line_anova$ANOVA)
line_aov_stats <- tibble(
effect = c("congruency", "alignment", "congruency:alignment"),
F = c(as.double(unlisted["F2"]), as.double(unlisted["F3"]),
as.double(unlisted["F4"])),
p = c(as.double(unlisted["p2"]), as.double(unlisted["p3"]),
as.double(unlisted["p4"])),
SSn = c(as.double(unlisted["SSn2"]), as.double(unlisted["SSn3"]),
as.double(unlisted["SSn4"])),
SSd = c(as.double(unlisted["SSd2"]), as.double(unlisted["SSd3"]),
as.double(unlisted["SSd4"]))) %>%
mutate(partial_eta_squared = SSn / (SSn + SSd)) %>%
select(-SSn, -SSd)
kable(line_aov_stats, digits = 2, col.names = c("Effect", "F", "p", "Partial eta squared"),
caption = "Line task ANOVA effect sizes") %>%
kable_styling(full_width = F, position = "center", bootstrap_options = c("hover")) %>%
column_spec(1, bold = T, border_right = T)
| Effect | F | p | Partial eta squared |
|---|---|---|---|
| congruency | 9.31 | 0.01 | 0.42 |
| alignment | 0.80 | 0.39 | 0.06 |
| congruency:alignment | 29.83 | 0.00 | 0.70 |
In the line task there is again a large main effect of congruency (ηp2 = 0.42). The congruency by alignment interaction (our key replication statistic) is even larger at ηp2 = 0.70. Note that this is also greater than original effect size of .52.
# face anova
unlisted <- unlist(face_anova$ANOVA)
face_aov_stats <- tibble(
effect = c("congruency", "alignment", "congruency:alignment"),
F = c(as.double(unlisted["F2"]), as.double(unlisted["F3"]),
as.double(unlisted["F4"])),
p = c(as.double(unlisted["p2"]), as.double(unlisted["p3"]),
as.double(unlisted["p4"])),
SSn = c(as.double(unlisted["SSn2"]), as.double(unlisted["SSn3"]),
as.double(unlisted["SSn4"])),
SSd = c(as.double(unlisted["SSd2"]), as.double(unlisted["SSd3"]),
as.double(unlisted["SSd4"]))) %>%
mutate(partial_eta_squared = SSn / (SSn + SSd)) %>%
select(-SSn, -SSd)
kable(face_aov_stats, digits = 2, col.names = c("Effect", "F", "p", "Partial eta squared"),
caption = "Face task ANOVA effect sizes") %>%
kable_styling(full_width = F, position = "center", bootstrap_options = c("hover")) %>%
column_spec(1, bold = T, border_right = T)
| Effect | F | p | Partial eta squared |
|---|---|---|---|
| congruency | 14.04 | 0.00 | 0.52 |
| alignment | 0.04 | 0.84 | 0.00 |
| congruency:alignment | 0.01 | 0.91 | 0.00 |
Effect sizes for the face task ANOVA confirm that the effect of congruency is large (ηp2 = 0.52).
Here, I am running a Bayesian analysis of task similarity. In the original paper, the authors made a point of the null three-way interaction between congruency, alignment and task as showing that there was not a significant difference in the composite effect by task. However, in order to provide evidence of this, a more appropriate analysis would be to first model the data using congruency and alignment (Model 1), then model the data a second time incorporating task as an effect (Model 2), and then compare the performance of these two models. The model odds can be compared using bridgesampling (which computes the log marginal likelihood of the fit) and calculating a bayes factor (BF). Here, BF was calculated as Model 2 vs Model 1 using the function ‘bf’ from the bridgesampling package. In this case, a BF greater than 1 would be evidence that the second model performed better (indicating the predictive power of task, i.e. task dissimilarity) whereas a BF less than 1 would be evidence for H0 (which could either be no difference between the models, or in favor of Model 1 over Model 2), indicating that incorporating task into the model did not improve performance (i.e. task similarity).
# Bayesian task similarity analysis
# Model 1: congruency and alignment
fit1 <- brm(d_prime ~ congruency + alignment + congruency:alignment, d.results,
save_all_pars = T)
summary(fit1)
bridge_1 <- brms::bridge_sampler(fit1)
# Model 2: congruency, alignment and task
# To test whether adding task (trial_id) to the model improves fit
fit2 <- brm(d_prime ~ trial_id + congruency + alignment + trial_id:congruency
+ trial_id:alignment + congruency:alignment + trial_id:congruency:alignment,
d.results, save_all_pars = T)
summary(fit2)
bridge_2 <- brms::bridge_sampler(fit2)
# calculate BF in favor of Model 2 (set as x1) over Model 1 (set as x2)
bridgesampling::bf(bridge_2,bridge_1)
## The estimated Bayes factor in favor of x1 over x2 is equal to: 2125.526
This is very strong evidence for Model 2, indicating task dissimilarity. This is somewhat obvious given my pattern of results but it would be useful if the composite effect for each task looked more similar and the three-way (task x congruency x alignment) interaction had been non-significant.
Below, I am looking further into whether a fatigue effect with the face task could be contributing to the lack of a composite effect in that task. To do this, I am dividing the results into the first and second halves of the task.
# Plot the results of the first half of the face task vs the second
d.full.face <- d %>%
filter(trial_id == "face")
# calculate the trial index at halfway
half <- min(d.full.face$trial_index) +
(max(d.full.face$trial_index) - min(d.full.face$trial_index))/2
# use this to divide the results into halves
d.full.face <- d.full.face %>%
mutate(trial_index = ifelse(trial_index <= half, 0, 1))
d.ff.results <- d.full.face %>%
group_by(worker_id,trial_index,alignment,congruency) %>%
summarize(rt = mean(rt),
avg = mean(correct),
false_alarm = sum(response=="same"&answer=="different")/sum(response=="same"),
hit_rate = sum(response=="different"&answer=="different")/sum(response=="different"),
max_fa = sum(response=="same"),
max_hr = sum(response=="different")) %>%
ungroup() %>%
mutate(false_alarm = ifelse(false_alarm == 0, 1/(2*max_fa), false_alarm)) %>%
mutate(hit_rate = ifelse(hit_rate == 1, 1-1/(2*max_hr), hit_rate)) %>%
group_by(worker_id,trial_index, alignment,congruency) %>%
mutate(d_prime = qnorm(hit_rate)-qnorm(false_alarm))
d.ff.plot <- summarySEwithin(d.ff.results, measurevar="d_prime",
withinvars=c("trial_index","alignment","congruency"),
idvar="worker_id", na.rm=FALSE, conf.interval=.95)
names <- c("0" = "First half", "1" = "Second half")
d.ff.plot %>%
ggplot(aes(x= alignment, y = d_prime)) +
facet_grid(.~ trial_index, labeller = as_labeller(names)) +
geom_point(aes(col=congruency)) +
geom_errorbar(aes(ymin=d_prime-se, ymax=d_prime+se), width=.1) +
geom_line(aes(group = congruency,col=congruency)) +
labs(title = "Face task by half") +
ylim(0,3) + ylab("Sensitivity (d')") + xlab("") +
scale_color_manual(values=c("red", "blue")) +
theme_few() +
theme(aspect.ratio=1) +
theme(plot.title = element_text(hjust = 0.5))
As far as I can tell, splitting the task into two halves does not improve matters and performance looks roughly comparable. It is unlikely that fatigue effects are the root of the lack of a composite effect in this task.
One thing I noticed when looking at the overall RT histogram in the confirmatory analyses, and at group mean RTs by task in the exploratory analyses, was that RTs were very low. I think an RT of less than 150 ms is somewhat implausible, so I will rerun the analysis trimming RTs less than this, under the assumption that these RTs represent trials where participants are just pressing buttons randomly or accidentally.
# Look more closely at RT by task
d %>%
ggplot(aes(x=rt,fill=trial_id)) +
geom_histogram(binwidth=30, position="jitter", alpha = 0.8) +
labs(title = "Histogram of reaction time (divided by task)",
x = "Reaction time", y = "Count") +
xlim(0,2000) +
scale_fill_manual(values = c("#9999CC", "#66CC99")) +
theme_few() +
theme(plot.title = element_text(hjust = 0.5))
# Trim implausible RTs (ie anything less than 150ms)
d.trimmed <- d %>%
filter(rt > 150)
d.trimmed.results <- d.trimmed %>%
group_by(worker_id,trial_id,alignment,congruency) %>%
summarize(rt = mean(rt),
avg = mean(correct),
false_alarm = sum(response=="same"&answer=="different")/sum(response=="same"),
hit_rate = sum(response=="different"&answer=="different")/sum(response=="different"),
max_fa = sum(response=="same"),
max_hr = sum(response=="different")) %>%
ungroup() %>%
mutate(false_alarm = ifelse(false_alarm == 0, 1/(2*max_fa), false_alarm)) %>%
mutate(hit_rate = ifelse(hit_rate == 1, 1-1/(2*max_hr), hit_rate)) %>%
mutate(hit_rate = ifelse(hit_rate == 0, 1/(2*max_hr), hit_rate)) %>% #account for anon0
group_by(worker_id,trial_id, alignment,congruency) %>%
mutate(d_prime = qnorm(hit_rate)-qnorm(false_alarm))
# Replot after trimming
d.plot <- summarySEwithin(d.trimmed.results, measurevar="d_prime",
withinvars=c("trial_id","alignment","congruency"),
idvar="worker_id", na.rm=FALSE, conf.interval=.95)
d.plot %>%
ggplot(aes(x= alignment, y = d_prime)) +
facet_grid(.~ trial_id, labeller = as_labeller(task_names)) +
geom_point(aes(col=congruency)) +
geom_errorbar(aes(ymin=d_prime-se, ymax=d_prime+se), width=.1) +
geom_line(aes(group = congruency,col=congruency)) +
labs(title = "Trimmed results") +
ylim(0,3) + ylab("Sensitivity (d')") + xlab("") +
scale_color_manual(values=c("red", "blue")) +
theme_few() +
theme(aspect.ratio=1) +
theme(plot.title = element_text(hjust = 0.5))
Again, this does not appear to be the issue. After trimming, the composite effect in the line task remains, while there is still not effect in the face task (if anything, the relationship is now going in the opposite direction).
To probe this further, I will now look at individual subject results.
# Plot individual face task results
d.plot.face <- d.results %>%
filter(trial_id == "face")
d.plot.face %>%
ggplot(aes(x= alignment, y = d_prime)) +
facet_wrap(~ worker_id, ncol = 5) +
geom_point(aes(col=congruency)) +
geom_errorbar(aes(ymin=d_prime, ymax=d_prime), width=.1) +
geom_line(aes(group = congruency,col=congruency)) +
scale_color_manual(values=c("red", "blue")) +
labs(title = "Individual results for the face task") +
ylab("Sensitivity (d')") + xlab("Alignment") +
theme_few() +
theme(plot.title = element_text(hjust = 0.5))
# Plot individual line task results
d.plot.line <- d.results %>%
filter(trial_id == "line")
d.plot.line %>%
ggplot(aes(x= alignment, y = d_prime)) +
facet_wrap(~ worker_id, ncol = 5) +
geom_point(aes(col=congruency)) +
geom_errorbar(aes(ymin=d_prime, ymax=d_prime), width=.1) +
geom_line(aes(group = congruency,col=congruency)) +
scale_color_manual(values=c("red", "blue")) +
labs(title = "Individual results for the line task") +
ylab("Sensitivity (d')") + xlab("Alignment") +
theme_few() +
theme(plot.title = element_text(hjust = 0.5))
# remove anon0 because of weird performance on line task and replot
d.filt <- d.results %>%
filter(worker_id != "anon0")
d.filt <- summarySEwithin(d.filt, measurevar="d_prime",
withinvars=c("trial_id","alignment","congruency"),
idvar="worker_id", na.rm=FALSE, conf.interval=.95)
task_names <- c("face" = "Face task", "line" = "Line task")
d.filt %>%
ggplot(aes(x= alignment, y = d_prime)) +
facet_grid(.~ trial_id, labeller = as_labeller(task_names)) +
geom_point(aes(col=congruency)) +
geom_errorbar(aes(ymin=d_prime-se, ymax=d_prime+se), width=.1) +
geom_line(aes(group = congruency,col=congruency)) +
labs(title = "Replication plot") +
ylim(0,3) + ylab("Sensitivity (d')") + xlab("") +
scale_color_manual(values=c("red", "blue")) +
theme_few() +
theme(aspect.ratio=1) +
theme(plot.title = element_text(hjust = 0.5))
It doesn’t appear that an outlier participant is driving the unusual face task results. However, one participant did show very poor performance in incongruent condition in the line task (d’ less than -2!). I removed this participant and replotted the overall results. There still appears to be a large composite effect in the line task and none in the face task.
Finally, let’s reanalyze the data in the classic or ‘partial’ composite design manner. In this version of the composite task, the face composite effect (FCE) in accuracy is calculated by subtracting the mean accuracy on aligned/same/incongruent trials from that on misaligned/same/incongruent trials, with a significant positive value considered to be a composite effect. Note that neither the line nor the face task was designed for the partial composite effect (and included numerous extra trials) so this is not definitive.
For further discussion of the differences, strengths and weaknesses of the two designs see Rossion, 2013 (c.f. Richler & Gauthier, 2013).
d.partial.results <- d %>%
group_by(worker_id,trial_id,alignment,congruency,response) %>%
summarize(accuracy = mean(correct)) %>%
filter(response == "same" & congruency == "incongruent")
d.partial.fce <- d %>%
group_by(worker_id,trial_id,alignment,congruency,response) %>%
summarize(accuracy = mean(correct)) %>%
filter(response == "same" & congruency == "incongruent") %>%
spread(alignment, accuracy) %>%
mutate(fce = misaligned-aligned) %>%
group_by(trial_id) %>%
summarise(effect = mean(fce))
d.partial.fce <- d.partial.fce[,2]
rownames(d.partial.fce) <- c("Face task", "Line task")
kable(d.partial.fce, digits = 2, col.names = c("Face composite effect")) %>%
kable_styling(bootstrap_options = c("hover"), full_width = F) %>%
column_spec(1, bold = T, border_right = T)
| Face composite effect | |
|---|---|
| Face task | 0.00 |
| Line task | -0.02 |
d.partial.plot <- summarySEwithin(d.partial.results, measurevar="accuracy",
withinvars=c("trial_id","alignment"),
idvar="worker_id", na.rm=FALSE, conf.interval=.95)
d.partial.plot %>%
ggplot(aes(x= trial_id, y = accuracy, fill=alignment)) +
geom_bar(stat="identity", position="dodge") +
geom_errorbar(aes(ymin=accuracy-se, ymax=accuracy+se),
width=.2,
position=position_dodge(.9)) +
scale_fill_manual(values=c("cyan", "blue")) +
labs(title = "Partial design results") +
ylab("Accuracy") + xlab("Task") +
theme_few() +
theme(aspect.ratio=1) +
theme(plot.title = element_text(hjust = 0.5)) +
ylim(0,1)
There is no composite effect shown for either task when analyzed as for the partial design.
The primary finding that I was attempting to replicate was an interaction between congruency and alignment in the line pattern task. In the original study, the authors found a significant congruency by alignment interaction (F(1, 21) = 22.80,p< .001, ηp2 = .52), driven by a significant congruency effect found in the aligned condition (t(21) = 5.95,p< .001), but not the misaligned condition (t(21) = 0.96,p= .348). In my replication, I also observed a significant congruency by alignment interaction in the line pattern task (F(1,13) = 29.83, p < .001, ηp2 = .70). As in the original study, this was driven by a significant effect of congruency in the aligned condition (t(13) = 4.48, p = .001) but not the misaligned condition (t(13) = 1.44, p = 0.348). However, while the key statistic I identified successfully replicated, there was no corresponding composite effect for the face task. In the face task, there was a significant effect of congruency (F(1,13) = 14.04, p = .002), but no congruency by alignment interaction. This seriously complicates interpretation of this replication attempt. A key point of the original paper was that line patterns could be processed holistically, like faces are known to be processed. The failure here to reproduce a well-known effect for face stimuli throws the replication into doubt more generally. As such, I would consider this a partially successful replication.
There are a number of reasons why I may have failed to see a composite effect in the face task. For one, the composite effect for the face task in the original paper was smaller than that for the line task. I powered my replication attempt based on the line task result, and thus may have been underpowered to detect the effect in the face task. Additionally, I used different stimuli than the original authors in the face task, but not the line task. The face stimuli I used came from the Rossion lab (Rossion, 2013) and are actually designed for a slightly different version of the composite task, sometimes referred to as the ‘partial design’. I had to adapt these stimuli to create the extra conditions included in the ‘full design’ used in Zhao et al., 2016. There were also fewer face stimuli available in this stimuli set, allowing for only 80 distinct trials. In order to match the unique 160 trials in the line task, I repeated each trial once. This may have contributed to a reduction in the composite effect.
Furthermore, instead of counterbalancing the two tasks, I presented each participant first with the line pattern task and then with the face task. This design was chosen so as to avoid influencing participants to use holistic processing by presenting the face task first. However, it may induced fatigue effects for the face task. I believe this is unlikely because accuracy was higher for the face task (83%) than the line task (74%) but I ran a number of exploratory analyses in order to check. I looked at the first and second halves of the face task separately and did not see evidence for a composite effect in either half of the task. I also trimmed RTs less than 150 ms, thinking that trials with such low RTs may reflect random or mistaked button pressing. However, this actually led to a greater difference by congruency in the misaligned condition than the aligned condition. Finally, looking at individual participant results did not suggest any outlier participants that could have been driving the unusual face task results. These subsequent data explorations lead me to conclude that fatigue effects are probably not behind the lack of a composite effect in the face task.
I also analyzed the results according to the analysis for the partial composite design. In this analysis only incongruent, same trials are considered and the comparison is accuracy (rather than response sensitivity) between the aligned and misaligned conditions. Greater accuracy for the misaligned than the aligned condition is taken as evidence for the composite effect. Using this method, I did not observe a composite effect for either the face task or the line pattern task. It would be interesting to know what such an analysis would yield in the original Zhao et al., 2016 dataset given the current divide in the literature about which design is preferrable. I should note though that the original study, and thus also this replication, were designed based on the full design and consequently analyses based on the partial design should be interpreted with caution.
Finally, I would also like to note that the original authors kindly provided feedback on our paradigm post data collection (but pre-analysis). They noticed two key differences. First, the images in our online implementation ended up being larger than presented in the original study. Second, in the original study, the critical top parts in the target stimuli were shifted in location by a set amount of pixels relative to the central fixation point in order to avoid the potential confounding factor of spatial attention. This was not included in my implementation. While I’m not sure why these factors would differentially affect the the line and face tasks, it is possible that they influenced these findings.
Richler, J. J., & Gauthier, I. (2013). When intuition fails to align with data: A reply to Rossion (2013). Visual cognition, 21(2), 254-276.
Rossion, B. (2013). The composite face illusion: A whole window into our understanding of holistic face perception. Visual Cognition, 21(2), 139-253.
Susilo, T., Rezlescu, C., & Duchaine, B. (2013). The composite effect for inverted faces is reliable at large sample sizes and requires the basic face configuration. Journal of Vision, 13(13), 14-14.
Zhao, M., Bülthoff, H. H., & Bülthoff, I. (2016). Beyond faces and expertise: Facelike holistic processing of nonface objects in the absence of expertise. Psychological science, 27(2), 213-222.