Replication of ‘Two faces of holistic face processing: Facilitation and interference underlying part-whole and composite effects’ by Jin, Hayward, & Cheung (2024, Journal of Vision)
Author
Seojin Lee (seojinl@stanford.edu)
Published
December 14, 2025
Introduction
Justification
My research interests lie in human visual perception, particularly in understanding how the brain integrates low-level information from complex visual input to achieve higher-level understanding of the object. Face is a great example of this, as it involves integrating smaller visual fragments such as eyes, nose, and mouth to achieve an understanding of the identity. The complete composite task from Jin, Hayward, & Cheung (2024) offers a well-defined behavioral paradigm to study this integration process. Specifically, it separates two components of holistic processing – facilitation and interference – rather than treating holistic processing as a single phenomenon. Replicating these effects will help me better understand the mechanisms of holistic face percepton and will inform my own research interests in visual integration.
Stimuli and Procedure
This project replicates the complete composite task online using jsPsych on Prolific. Each trial presents the following: fixaton (500 ms), a study composite (500 ms), a mask (500 ms), then a test stimulus. Participants judged whether the cued half (top or bottom) of the test face matched the same half of the study face, while ignoring the other half. The design manipulates: - Congruency (congruent vs. incongruent) - Alignment (aligned vs. misaligned) - Correct response (same vs. different) - Cue location (top vs. bottom)
Isolated top/bottom halves provide a baselien to measure facilitation (congruent - isolated) and interference (incongruent - isolated) effects. The target trial structure replicates the original: - 400 composite trials (2 x 2 x 2 x 2) For my replication, after piloting, I removed isolated blocks as the composite congruency x alignment effects I aim to replicate do not depend on isolated trials.
Stimuli were grayscale composite faces from the Chicago Face Database. I generated aligned and misaligned versions and added cue brackets to indicate the relevant half.
Original paper: https://jov.arvojournals.org/article.aspx?articleid=2802147
Methods
Note on Pilot Studies
Two pilot studies were conducted prior to preregistered data collection.
Pilot A tested an early version of the task with a longer trial structure, while Pilot B implemented the finalized version used for the preregistered design.
All confirmatory analyses in the present report refer to Pilot B, whereas Pilot A is included for completeness and design validation.
Power Analysis
Original effect size, power analysis for samples to achieve 80%, 90%, 95% power to detect that effect size. Considerations of feasibility for selecting planned sample size.
The original paper reported large congruency and alignment effects in the complete composite task (Δd’ = +0.45 for facilitation; Δd’ = -0.66 for interference). These effects were estimated using 455 participants because the authors examined reliability and correations across three holistic tasks, not because the composite task itself requried such a large sample. Im my replication, I focus on replicating the complete congruency x alignment effect, which has been shown to be large and reliable. Using a conservative estimate of Cohen’s dz = 0.40-0.50 for within-subject congruency effects, 60-80 participants provide ~85-95% power.
Planned N = 72, which balances high power, online data quanlity, and feasibility.
Stopping rule: Stop once 72 valid submissions are collected, accept 72-80 usable datasets after exclisions.
Exclusions:
RT < 200 ms or > 5000 ms
Incomplete responses
Participants who fail browser/attention checks
Materials
To match stimulus properties as closely as possible, I contacted the original author (Dr. Haiyang Jin, Oct 27, 2025). Because of copyright restrictions, the authors could not share their exact composite images, but they confirmed that stimuli were constructed from the Chicago Face Database. Following their guidnace, I downloaded CFD images, generated aligned and misaligned composites, created isolated halves initially, and added cue brackets.
Procedure
Can quote directly from original article - just put the text in quotations and note that this was followed precisely. Or, quote directly and just point out exceptions to what was described in the original article. The experiment followed the procedure described in Jin, Hayward, & Cheung (2024): “Each trial began with a fixation cross (500 ms), followed by a composite study face (500 ms), a mask (500 ms), and then a composite test face that remained onscreen until response.” On each trial, participants judged whether the cued half of a test face matched the study face. The factorical structure included: - Cue (top/bottom) - Congruency (congruent/incongruent) - Alignment (aligned/misaligned) - Same/different Participants completed only composite trials in this replication (see Differences section below).
Analysis Plan
Can also quote directly, though it is less often spelled out effectively for an analysis strategy section. The key is to report an analysis strategy that is as close to the original - data cleaning rules, data exclusion rules, covariates, etc. - as possible.
- Outcomes: sensitivity d’ (primary) and RT on correct trials. Primary tests: - Congruency effect (congruent > incongruent) on aligned composites; and Congruency x Alignment interaction. - Facilitation: aligned-congruent vs isolated; Interference: aligned-incongruent vs isolated.
Models: GLMMs following the original strategy (logistic for accuracy/d’ indexing; gamma or log-normal for RT)
Clarify key analysis of interest here You can also pre-specify additional analyses you plan to do.
Differences from Original Study
Explicitly describe known differences in sample, setting, procedure, and analysis plan from original study. The goal, of course, is to minimize those differences, but differences will inevitably occur. Also, note whether such differences are anticipated to make a difference based on claims in the original article or subsequent published research on the conditions for obtaining the effect. Several differences between my replication and Jin, Hayward, & Cheung (2024) exist due to practical constraints of online testing. 1. Task Scope The original study administered three holistic processing tasks (part-whole, standard composite, complete composite) within the same session, using a large sample to estimate correlations and reliability across tasks. In contrast, my replication tests only the complete composite task, because my goal is to replicate the within-task effects (facilitation and interference), not between-task relationships. Anticipated impact: The complete composite task is fully self-contained and does not rely on other tasks, so removing the part-whiole and standard compostie tasks should not significantly affect the replication of the key effects.
Sample size The original study recruited N = 455 online participants, which was driven by their goal of estimating cross-task reliability and between-task correaltions. My replication aims to recruit N = 72 Prolific participants, which is sufficient to detect the large within-subject d’ effects reported in the original paper (facilitation: +0.45, interference: -0.66). Anticipated impact: Power analysis using conservative medium effect sizes (d = 0.30-0.40) indicates >80% power with N = 72. Since I am not estimating cross-task correlations, the reduced sample size is appropriate and should be enough the replicate the core effects of the complete composite task.
Trial count and session duration The original task contained 400 composite trials + 80 isolated trials (~480 trials), which took ~40 minutes. My pilot implementation (Pilot A) unintentionally produced a much longer structure (~704 trials). Based on pilot feedback and Prolific feasibility considerations, I reduced the number of trials per cell by approximately half and removed the isolated-half baseline conditions, since my project does not analyze part-based performance. The final design preserves the full 2 x 2 x 2 composite structure (cue x alignment x congruency) but with fewer repetitions per cell. Anticipated impact: The original trial duration is quite long, increasing the risk of fatigue, dropout, and noisier responses in an online setting. Composite congruency and alignment effects are large and highly reliable, and prior studies indicate that these effects do not require very high trial counts to emerge. Therefore, even without isolated-half baselines and with fewer repetitions per condition, the reduced design should remain a valid and sensitive test of holistic face processing.
Analysis plan I follow the same analysis approach described in the paper: logistic GLMMs for accuracy-derived sensitivity (d’ indexed by fixed effects), with congruency, alignment, cue, and their interaction terms entered as predcitors. I apply the same trial-level exclusion rules (extreme RTs, invalid responses), and compute facilitation and interference as differences from isolated baselines. Anticipated impct: My analysis plan matches the original closely, and differences in sample or trial count do not alter the modeling structure.
Methods Addendum (Post Data Collection)
Actual Sample
Fifty participants were recruited via Prolific. All participants were adults and reported normal or corrected-to-normal vision. Data were screened using the preregistered exclusion criteria: trials with reaction times shorter than 200 ms or longer than 3000 ms were excluded, as were incomplete or invalid responses. After applying these criteria, all 50 participants were retained for analysis.
Differences from pre-data collection methods plan
The preregistered target sample size was N = 72. Data collection was stopped at N = 50 due to time constraints. No other deviations from the preregistered methods or analysis plan were made.
Results
Pilot A (Preliminary Implementation)
Pilot A was conducted on October 26, 2025, using an early version of the jsPsych composite task.
This version unintentionally included a larger number of trials (~704 total), which made the session longer and helped identify design adjustments for the final study.
Two participants completed Pilot A.
Data Loading and Preprocessing
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.1 ✔ stringr 1.5.2
✔ ggplot2 4.0.0 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.1.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Rows: 705 Columns: 36
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (18): trial_frame, trial_type, plugin_version, Subject, Exp_code, Exp_na...
dbl (16): item_width_mm, item_height_mm, item_width_px, px2mm, view_dist_mm,...
lgl (2): isPavlovia, Correct
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 705 Columns: 36
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (18): trial_frame, trial_type, plugin_version, Subject, Exp_code, Exp_na...
dbl (16): item_width_mm, item_height_mm, item_width_px, px2mm, view_dist_mm,...
lgl (2): isPavlovia, Correct
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ggplot(summary_df, aes(x = is_congruent, y = mean_acc, fill = is_aligned)) +geom_bar(stat ="identity", position =position_dodge()) +facet_wrap(~ Participant) +labs(title ="Pilot A: Accuracy by Congruency and Alignment",x ="Congruency",y ="Mean Accuracy",fill ="Alignment") +theme_minimal()
Reaction Time by Condition
ggplot(summary_df, aes(x = is_congruent, y = mean_rt, fill = is_aligned)) +geom_bar(stat ="identity", position =position_dodge()) +facet_wrap(~ Participant) +labs(title ="Pilot A: Reaction Time by Congruency and Alignment",x ="Congruency",y ="Mean RT (ms)",fill ="Alignment") +theme_minimal()
Notes on Pilot A
Pilot A revealed several issues that informed updates for Pilot B: The session duration was too long (~704 trials) and risked fatigue. This motivated reducing the number of repetitions per cell. The overall alignment × congruency pattern was visible but noisy due to only two participants. Minor issues with file naming and duplicate RT columns were corrected before Pilot B.
Pilot B (Finalized Implementation)
Pilot B was collected on November 29, 2025, using the finalized version of the jsPsych composite task.
This version implemented the full 2 × 2 × 2 × 2 composite design while reducing repetitions per condition to keep the task feasible for online data collection.
Two participants completed Pilot B. The purpose of Pilot B was to (a) verify the corrected trial structure, (b) ensure timing and branching logic worked as intended, and (c) confirm that the expected Congruency × Alignment pattern emerged in a clean dataset. #### Data Import and Initial Inspection
library(tidyverse)library(lme4)
Loading required package: Matrix
Attaching package: 'Matrix'
The following objects are masked from 'package:tidyr':
expand, pack, unpack
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (18): trial_frame, trial_type, plugin_version, Subject, Exp_code, Exp_na...
dbl (16): item_width_mm, item_height_mm, item_width_px, px2mm, view_dist_mm,...
lgl (2): isPavlovia, Correct
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 417 Columns: 36
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (18): trial_frame, trial_type, plugin_version, Subject, Exp_code, Exp_na...
dbl (16): item_width_mm, item_height_mm, item_width_px, px2mm, view_dist_mm,...
lgl (2): isPavlovia, Correct
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
raw <-bind_rows(d1, d2)dim(raw)
[1] 834 36
head(raw)
# A tibble: 6 × 36
trial_frame item_width_mm item_height_mm item_width_px px2mm view_dist_mm
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 virtual_chinrest 85.6 54.0 422 4.93 720.
2 test_face NA NA NA NA NA
3 test_face NA NA NA NA NA
4 test_face NA NA NA NA NA
5 test_face NA NA NA NA NA
6 test_face NA NA NA NA NA
# ℹ 30 more variables: rt <dbl>, item_width_deg <dbl>, px2deg <dbl>,
# win_width_deg <dbl>, win_height_deg <dbl>, trial_type <chr>,
# trial_index <dbl>, plugin_version <chr>, time_elapsed <dbl>, Subject <chr>,
# Exp_code <chr>, Exp_name <chr>, CFVersion <chr>, isPavlovia <lgl>,
# Browser <chr>, Prolific_id <chr>, Trial_num <dbl>, Cue <chr>,
# Congruency <chr>, Alignment <chr>, SameDifferent <chr>, StimGroup <chr>,
# StudyFace <chr>, TestFace <chr>, Correct_response <dbl>, MaskFace <chr>, …
Filtering to Experimental Trials
df <- raw %>%filter(trial_type =="image-keyboard-response",!is.na(SameDifferent),!is.na(Congruency),!is.na(Alignment),!is.na(Cue))dim(df)
[1] 832 36
head(df)
# A tibble: 6 × 36
trial_frame item_width_mm item_height_mm item_width_px px2mm view_dist_mm
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 test_face NA NA NA NA NA
2 test_face NA NA NA NA NA
3 test_face NA NA NA NA NA
4 test_face NA NA NA NA NA
5 test_face NA NA NA NA NA
6 test_face NA NA NA NA NA
# ℹ 30 more variables: rt <dbl>, item_width_deg <dbl>, px2deg <dbl>,
# win_width_deg <dbl>, win_height_deg <dbl>, trial_type <chr>,
# trial_index <dbl>, plugin_version <chr>, time_elapsed <dbl>, Subject <chr>,
# Exp_code <chr>, Exp_name <chr>, CFVersion <chr>, isPavlovia <lgl>,
# Browser <chr>, Prolific_id <chr>, Trial_num <dbl>, Cue <chr>,
# Congruency <chr>, Alignment <chr>, SameDifferent <chr>, StimGroup <chr>,
# StudyFace <chr>, TestFace <chr>, Correct_response <dbl>, MaskFace <chr>, …
Logistic mixed-effects model predicting accuracy from Congruency, Alignment, and their interaction, with a random intercept for Subject. This model tests whether the expected Congruency × Alignment interaction (i.e., strongest interference in aligned–incongruent condition) is present.
dp_summary <- dp %>%group_by(Congruency, Alignment) %>%summarise(mean_dp =mean(dprime),se_dp =sd(dprime) /sqrt(n()),.groups ="drop")ggplot(dp_summary, aes(x = Alignment, y = mean_dp, color = Congruency)) +geom_point(size =3, position =position_dodge(width =0.4)) +geom_errorbar(aes(ymin = mean_dp - se_dp, ymax = mean_dp + se_dp),width =0.1,position =position_dodge(width =0.4)) +labs(title ="Pilot B d′ by Congruency × Alignment",y ="d′ (Sensitivity)", x ="Alignment") +coord_cartesian(ylim =c(0, 2.5)) +theme_minimal(base_size =14)
Pilot B confirmed that: - The reduced trial count produced cleaner, faster sessions. - All condition labels (Congruency, Alignment, Cue, SameDifferent) were correctly logged. - The expected Aligned–Incongruent cost appeared in both accuracy and RT. - No structural or timing bugs remained after adjustments from Pilot A.
Final Results
Data preparation
Data preparation following the analysis plan. - Analyses were conducted on data from N = 50 participants after applying preregistered exclusion criteria (RT < 200 ms or > 3000 ms, incomplete trials). Only experimental composite trials were included. Accuracy, reaction time (RT), and sensitivity (d′) were analyzed following the same pipeline used in Pilot B.
Data exclusion / filtering
df <- raw %>%filter( trial_type =="image-keyboard-response",!is.na(SameDifferent),!is.na(Congruency),!is.na(Alignment),!is.na(Cue) )dim(df)
# Accuracy per subjectcue_subj <- df %>%group_by(Subject, Cue, Congruency, Alignment) %>%summarise(acc =mean(Correct), .groups ="drop")# Mean + SE across subjectscue_summary <- cue_subj %>%group_by(Cue, Congruency, Alignment) %>%summarise(acc =mean(acc),se =sd(acc) /sqrt(n()),.groups ="drop" )cue_summary
# A tibble: 8 × 5
Cue Congruency Alignment acc se
<fct> <fct> <fct> <dbl> <dbl>
1 top congruent aligned 0.866 NA
2 top congruent misaligned 0.805 NA
3 top incongruent aligned 0.693 NA
4 top incongruent misaligned 0.747 NA
5 bot congruent aligned 0.836 NA
6 bot congruent misaligned 0.784 NA
7 bot incongruent aligned 0.577 NA
8 bot incongruent misaligned 0.625 NA
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
Discussion
Summary of Replication Attempt
The present study successfully replicated the core behavioral findings of Jin, Hayward, and Cheung (2024) using an online implementation of the complete composite face task. Sensitivity (d’) was highest for aligned-congruent composites (M = 2.22, SE = 0.08) and lowest for aligned-incongruent composites (M = 0.73, SE = 0.08). Critically, the congruency effect – defined as the difference between congruent and incongruent trials – was substantially larger in the aligned condition (Δd’ = 1.49) than in the misaligned condition (Δd’ = 0.70), yielding a clear Congruency x Alignment interaction. This pattern closely mirrors the primary effect reported in the original study.
Commentary
Reaction time differences across conditions were modest, consistent with Jin et al. (2024). Mean RTs ranged from approximately 1037-1124 ms, with aligned-incongruent trials slower than aligned-congruent trials by about 62 ms, and a smaller congruency-related difference under misalignment. This pattern suggests that the composite face effect was expressed primarily as a sensitivity cost rather than a pronounced slowing of responses.
Despite several methodological differences from the original study (smaller smaple size, reduced trial counts, exclusion of isolated-part baselines), the qualitative and quantitative patterns of results closely matched those previously reported. Exploratory analyses indicated that holistic interference effects were evident for both cue locations, with a stronger congruency effect observed for bottom-cued trials than for top-cued trials. However, this asymmetry was not a focus of the preregistered analyses and should be interpreted cautiously. Together, these findings indicate that the complete composite task provides a robust measure of holistic face processing that generalizes across various implementation.