Replication of ‘Two faces of holistic face processing: Facilitation and interference underlying part-whole and composite effects’ by Jin, Hayward, & Cheung (2024, Journal of Vision)
Author
Seojin Lee (seojinl@stanford.edu)
Published
November 30, 2025
Introduction
Justification
My research interests lie in human visual perception, particularly in understanding how the brain integrates low-level information from complex visual input to achieve higher-level understanding of the object. Face is a great example of this, as it involves integrating smaller visual fragments such as eyes, nose, and mouth to achieve an understanding of the identity. The complete composite task from Jin, Hayward, & Cheung (2024) offers a well-defined behavioral paradigm to study this integration process. Specifically, it separates two components of holistic processing – facilitation and interference – rather than treating holistic processing as a single phenomenon. Replicating these effects will help me better understand the mechanisms of holistic face percepton and will inform my own research interests in visual integration.
Stimuli and Procedure
This project replicates the complete composite task online using jsPsych on Prolific. Each trial presents the following: fixaton (500 ms), a study composite (500 ms), a mask (500 ms), then a test stimulus. Participants judged whether the cued half (top or bottom) of the test face matched the same half of the study face, while ignoring the other half. The design manipulates: - Congruency (congruent vs. incongruent) - Alignment (aligned vs. misaligned) - Correct response (same vs. different) - Cue location (top vs. bottom)
Isolated top/bottom halves provide a baselien to measure facilitation (congruent - isolated) and interference (incongruent - isolated) effects. The target trial structure replicates the original: - 400 composite trials (2 x 2 x 2 x 2) For my replication, after piloting, I removed isolated blocks as the composite congruency x alignment effects I aim to replicate do not depend on isolated trials.
Stimuli were grayscale composite faces from the Chicago Face Database. I generated aligned and misaligned versions and added cue brackets to indicate the relevant half.
Original paper: https://jov.arvojournals.org/article.aspx?articleid=2802147
Methods
Power Analysis
Original effect size, power analysis for samples to achieve 80%, 90%, 95% power to detect that effect size. Considerations of feasibility for selecting planned sample size.
The original paper reported large congruency and alignment effects in the complete composite task (Δd’ = +0.45 for facilitation; Δd’ = -0.66 for interference). These effects were estimated using 455 participants because the authors examined reliability and correations across three holistic tasks, not because the composite task itself requried such a large sample. Im my replication, I focus on replicating the complete congruency x alignment effect, which has been shown to be large and reliable. Using a conservative estimate of Cohen’s dz = 0.40-0.50 for within-subject congruency effects, 60-80 participants provide ~85-95% power.
Planned N = 72, which balances high power, online data quanlity, and feasibility.
Stopping rule: Stop once 72 valid submissions are collected, accept 72-80 usable datasets after exclisions.
Exclusions:
RT < 200 ms or > 5000 ms
Incomplete responses
Participants who fail browser/attention checks
Materials
To match stimulus properties as closely as possible, I contacted the original author (Dr. Haiyang Jin, Oct 27, 2025). Because of copyright restrictions, the authors could not share their exact composite images, but they confirmed that stimuli were constructed from the Chicago Face Database. Following their guidnace, I downloaded CFD images, generated aligned and misaligned composites, created isolated halves initially, and added cue brackets.
Procedure
Can quote directly from original article - just put the text in quotations and note that this was followed precisely. Or, quote directly and just point out exceptions to what was described in the original article. The experiment followed the procedure described in Jin, Hayward, & Cheung (2024): “Each trial began with a fixation cross (500 ms), followed by a composite study face (500 ms), a mask (500 ms), and then a composite test face that remained onscreen until response.” On each trial, participants judged whether the cued half of a test face matched the study face. The factorical structure included: - Cue (top/bottom) - Congruency (congruent/incongruent) - Alignment (aligned/misaligned) - Same/different Participants completed only composite trials in this replication (see Differences section below).
Analysis Plan
Can also quote directly, though it is less often spelled out effectively for an analysis strategy section. The key is to report an analysis strategy that is as close to the original - data cleaning rules, data exclusion rules, covariates, etc. - as possible.
- Outcomes: sensitivity d’ (primary) and RT on correct trials. Primary tests: - Congruency effect (congruent > incongruent) on aligned composites; and Congruency x Alignment interaction. - Facilitation: aligned-congruent vs isolated; Interference: aligned-incongruent vs isolated.
Models: GLMMs following the original strategy (logistic for accuracy/d’ indexing; gamma or log-normal for RT)
Clarify key analysis of interest here You can also pre-specify additional analyses you plan to do.
Differences from Original Study
Explicitly describe known differences in sample, setting, procedure, and analysis plan from original study. The goal, of course, is to minimize those differences, but differences will inevitably occur. Also, note whether such differences are anticipated to make a difference based on claims in the original article or subsequent published research on the conditions for obtaining the effect. Several differences between my replication and Jin, Hayward, & Cheung (2024) exist due to practical constraints of online testing. 1. Task Scope The original study administered three holistic processing tasks (part-whole, standard composite, complete composite) within the same session, using a large sample to estimate correlations and reliability across tasks. In contrast, my replication tests only the complete composite task, because my goal is to replicate the within-task effects (facilitation and interference), not between-task relationships. Anticipated impact: The complete composite task is fully self-contained and does not rely on other tasks, so removing the part-whiole and standard compostie tasks should not significantly affect the replication of the key effects.
Sample size The original study recruited N = 455 online participants, which was driven by their goal of estimating cross-task reliability and between-task correaltions. My replication aims to recruit N = 72 Prolific participants, which is sufficient to detect the large within-subject d’ effects reported in the original paper (facilitation: +0.45, interference: -0.66). Anticipated impact: Power analysis using conservative medium effect sizes (d = 0.30-0.40) indicates >80% power with N = 72. Since I am not estimating cross-task correlations, the reduced sample size is appropriate and should be enough the replicate the core effects of the complete composite task.
Trial count and session duration The original task contained 400 composite trials + 80 isolated trials (~480 trials), which took ~40 minutes. My pilot implementation (Pilot A) unintentionally produced a much longer structure (~704 trials). Based on pilot feedback and Prolific feasibility considerations, I reduced the number of trials per cell by approximately half and removed the isolated-half baseline conditions, since my project does not analyze part-based performance. The final design preserves the full 2 x 2 x 2 composite structure (cue x alignment x congruency) but with fewer repetitions per cell. Anticipated impact: The original trial duration is quite long, increasing the risk of fatigue, dropout, and noisier responses in an online setting. Composite congruency and alignment effects are large and highly reliable, and prior studies indicate that these effects do not require very high trial counts to emerge. Therefore, even without isolated-half baselines and with fewer repetitions per condition, the reduced design should remain a valid and sensitive test of holistic face processing.
Analysis plan I follow the same analysis approach described in the paper: logistic GLMMs for accuracy-derived sensitivity (d’ indexed by fixed effects), with congruency, alignment, cue, and their interaction terms entered as predcitors. I apply the same trial-level exclusion rules (extreme RTs, invalid responses), and compute facilitation and interference as differences from isolated baselines. Anticipated impct: My analysis plan matches the original closely, and differences in sample or trial count do not alter the modeling structure.
Methods Addendum (Post Data Collection)
You can comment this section out prior to final report with data collection.
Actual Sample
Sample size, demographics, data exclusions based on rules spelled out in analysis plan
Differences from pre-data collection methods plan
Any differences from what was described as the original plan, or “none”.
Results
Pilot A (Preliminary Implementation)
Pilot A was conducted on October 26, 2025, using an early version of the jsPsych composite task.
This version unintentionally included a larger number of trials (~704 total), which made the session longer and helped identify design adjustments for the final study.
Two participants completed Pilot A.
Data Loading and Preprocessing
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.1 ✔ stringr 1.5.2
✔ ggplot2 4.0.0 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.1.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Rows: 705 Columns: 36
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (18): trial_frame, trial_type, plugin_version, Subject, Exp_code, Exp_na...
dbl (16): item_width_mm, item_height_mm, item_width_px, px2mm, view_dist_mm,...
lgl (2): isPavlovia, Correct
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 705 Columns: 36
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (18): trial_frame, trial_type, plugin_version, Subject, Exp_code, Exp_na...
dbl (16): item_width_mm, item_height_mm, item_width_px, px2mm, view_dist_mm,...
lgl (2): isPavlovia, Correct
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ggplot(summary_df, aes(x = is_congruent, y = mean_acc, fill = is_aligned)) +geom_bar(stat ="identity", position =position_dodge()) +facet_wrap(~ Participant) +labs(title ="Pilot A: Accuracy by Congruency and Alignment",x ="Congruency",y ="Mean Accuracy",fill ="Alignment") +theme_minimal()
Reaction Time by Condition
ggplot(summary_df, aes(x = is_congruent, y = mean_rt, fill = is_aligned)) +geom_bar(stat ="identity", position =position_dodge()) +facet_wrap(~ Participant) +labs(title ="Pilot A: Reaction Time by Congruency and Alignment",x ="Congruency",y ="Mean RT (ms)",fill ="Alignment") +theme_minimal()
Notes on Pilot A
Pilot A revealed several issues that informed updates for Pilot B: The session duration was too long (~704 trials) and risked fatigue. This motivated reducing the number of repetitions per cell. The overall alignment × congruency pattern was visible but noisy due to only two participants. Minor issues with file naming and duplicate RT columns were corrected before Pilot B.
Pilot B (Finalized Implementation)
Pilot B was collected on November 29, 2025, using the finalized version of the jsPsych composite task.
This version implemented the full 2 × 2 × 2 × 2 composite design while reducing repetitions per condition to keep the task feasible for online data collection.
Two participants completed Pilot B. The purpose of Pilot B was to (a) verify the corrected trial structure, (b) ensure timing and branching logic worked as intended, and (c) confirm that the expected Congruency × Alignment pattern emerged in a clean dataset. #### Data Import and Initial Inspection
library(tidyverse)library(lme4)
Loading required package: Matrix
Attaching package: 'Matrix'
The following objects are masked from 'package:tidyr':
expand, pack, unpack
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (18): trial_frame, trial_type, plugin_version, Subject, Exp_code, Exp_na...
dbl (16): item_width_mm, item_height_mm, item_width_px, px2mm, view_dist_mm,...
lgl (2): isPavlovia, Correct
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 417 Columns: 36
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (18): trial_frame, trial_type, plugin_version, Subject, Exp_code, Exp_na...
dbl (16): item_width_mm, item_height_mm, item_width_px, px2mm, view_dist_mm,...
lgl (2): isPavlovia, Correct
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
raw <-bind_rows(d1, d2)dim(raw)
[1] 834 36
head(raw)
# A tibble: 6 × 36
trial_frame item_width_mm item_height_mm item_width_px px2mm view_dist_mm
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 virtual_chinrest 85.6 54.0 422 4.93 720.
2 test_face NA NA NA NA NA
3 test_face NA NA NA NA NA
4 test_face NA NA NA NA NA
5 test_face NA NA NA NA NA
6 test_face NA NA NA NA NA
# ℹ 30 more variables: rt <dbl>, item_width_deg <dbl>, px2deg <dbl>,
# win_width_deg <dbl>, win_height_deg <dbl>, trial_type <chr>,
# trial_index <dbl>, plugin_version <chr>, time_elapsed <dbl>, Subject <chr>,
# Exp_code <chr>, Exp_name <chr>, CFVersion <chr>, isPavlovia <lgl>,
# Browser <chr>, Prolific_id <chr>, Trial_num <dbl>, Cue <chr>,
# Congruency <chr>, Alignment <chr>, SameDifferent <chr>, StimGroup <chr>,
# StudyFace <chr>, TestFace <chr>, Correct_response <dbl>, MaskFace <chr>, …
Filtering to Experimental Trials
df <- raw %>%filter(trial_type =="image-keyboard-response",!is.na(SameDifferent),!is.na(Congruency),!is.na(Alignment),!is.na(Cue))dim(df)
[1] 832 36
head(df)
# A tibble: 6 × 36
trial_frame item_width_mm item_height_mm item_width_px px2mm view_dist_mm
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 test_face NA NA NA NA NA
2 test_face NA NA NA NA NA
3 test_face NA NA NA NA NA
4 test_face NA NA NA NA NA
5 test_face NA NA NA NA NA
6 test_face NA NA NA NA NA
# ℹ 30 more variables: rt <dbl>, item_width_deg <dbl>, px2deg <dbl>,
# win_width_deg <dbl>, win_height_deg <dbl>, trial_type <chr>,
# trial_index <dbl>, plugin_version <chr>, time_elapsed <dbl>, Subject <chr>,
# Exp_code <chr>, Exp_name <chr>, CFVersion <chr>, isPavlovia <lgl>,
# Browser <chr>, Prolific_id <chr>, Trial_num <dbl>, Cue <chr>,
# Congruency <chr>, Alignment <chr>, SameDifferent <chr>, StimGroup <chr>,
# StudyFace <chr>, TestFace <chr>, Correct_response <dbl>, MaskFace <chr>, …
Logistic mixed-effects model predicting accuracy from Congruency, Alignment, and their interaction, with a random intercept for Subject. This model tests whether the expected Congruency × Alignment interaction (i.e., strongest interference in aligned–incongruent condition) is present.
dp_summary <- dp %>%group_by(Congruency, Alignment) %>%summarise(mean_dp =mean(dprime),se_dp =sd(dprime) /sqrt(n()),.groups ="drop")ggplot(dp_summary, aes(x = Alignment, y = mean_dp, color = Congruency)) +geom_point(size =3, position =position_dodge(width =0.4)) +geom_errorbar(aes(ymin = mean_dp - se_dp, ymax = mean_dp + se_dp),width =0.1,position =position_dodge(width =0.4)) +labs(title ="Pilot B d′ by Congruency × Alignment",y ="d′ (Sensitivity)", x ="Alignment") +coord_cartesian(ylim =c(0, 2.5)) +theme_minimal(base_size =14)
Pilot B confirmed that: - The reduced trial count produced cleaner, faster sessions. - All condition labels (Congruency, Alignment, Cue, SameDifferent) were correctly logged. - The expected Aligned–Incongruent cost appeared in both accuracy and RT. - No structural or timing bugs remained after adjustments from Pilot A.
Data preparation
Data preparation following the analysis plan.
Confirmatory analysis
The analyses as specified in the analysis plan.
Side-by-side graph with original graph is ideal here
Exploratory analyses
Any follow-up analyses desired (not required).
Discussion
Summary of Replication Attempt
Open the discussion section with a paragraph summarizing the primary result from the confirmatory analysis and the assessment of whether it replicated, partially replicated, or failed to replicate the original result.
Commentary
Add open-ended commentary (if any) reflecting (a) insights from follow-up exploratory analysis, (b) assessment of the meaning of the replication (or not) - e.g., for a failure to replicate, are the differences between original and present study ones that definitely, plausibly, or are unlikely to have been moderators of the result, and (c) discussion of any objections or challenges raised by the current and original authors about the replication attempt. None of these need to be long.