Replication of study 5 Kurdi et al., 2024

Links

respiratory: https://github.com/milkawaniak/waniak2025/
pre-registration: https://osf.io/f4azs/overview
hosted experiment: https://milkawaniak.github.io/waniak2025/materials/experimental.html

Introduction

The Affect Misattribution Procedure (AMP; Payne et al., 2005) is a widely used indirect measure of implicit evaluation. In this task, participants first see a brief “prime” (e.g., a real life object) and then a neutral target (e.g., an abstract painting), which they rate for pleasantness. Critically, even though participants are instructed to ignore the prime, their ratings of the neutral target are systematically biased by the prime’s valence. For example, targets that follow a positive prime tend to be judged more pleasant than those following a negative prime. This pattern occurs without any explicit intention to evaluate the prime, implying that the AMP effect reflects an automatic evaluative response to the prime that is misattributed to the target. In other words, AMP scores have traditionally been interpreted as indexing implicit attitudes – evaluation that arise unintentionally and outside of conscious control.

Because participants try to discount the prime, AMP effects are thought to capture unintentional biases. Indeed, AMP priming persists even when participants are explicitly warned about potential biases or given incentives to correct for the prime. There is substantial evidence showing that AMP measures automatic evaluations that people cannot easily override (Bar-Anan & Nosek, 2016; Gawronski & Ye, 2013, 2015; Mann et al., 2019), concluding that AMP has vast evidence showing its validity of measuring implicit evaluations.

Recently, however, Hughes et al. (2023) questioned this assumption by examining participants’ reported awareness of prime influence. They modified the AMP so that after each trial participants indicated whether they felt the prime had influenced their target rating. Hughes et al. found that the standard AMP effect was much larger on trials where participants said they were “influenced” by the prime than on trials where they reported no influence. On this basis, Hughes et al. argued that AMP priming depends on conscious awareness of the prime’s impact (“influence awareness”), suggesting that the AMP may not be measuring a fully unconscious process.

The study I am replicating (Study 5, Kurdi et al., 2024) was designed to test whether such influence reports truly reflect introspective awareness or instead stem from an inferential account (e.g., assuming congruent prime-response pairs imply influence). To do so, each participant completed two parallel AMP tasks. First, they made standard AMP ratings and reported trial-by-trial whether the prime influenced their own response. Second, they completed a “third‐party” AMP: on each trial they saw the same prime–target pair and reported whether they thought the prime influenced a past participant’s response to that target. If participants use infer after they respond based on matched prime–rating valence implies influence, then first-person and third-party awareness judgments should show the same pattern and be correlated. In contrast, if awareness judgments reflect privileged introspective access, they should appear only for one’s own responses and not generalize to judgments about another person. Thus, we predict that (a) the pattern of reported influence for self and other will be similar and (b) individual rates of self-reported awareness will correlate with rates of third-person awareness. These hypotheses will allow us to determine whether AMP “influence awareness” effects arise from genuine insight into one’s own automatic responses or from post-hoc inferences based on response congruency.

Methods

Participants

An a priori power analysis was conducted to determine the required sample size to replicate the effect reported in Kurdi et al. (2024). Based on the reported correlation (r = .487), a two-tailed test with α = .05 and 80% power indicated a required sample size of approximately 30 participants (n = 29.95).

Data were collected from 38 participants recruited via Prolific, with compensation set at approximately $8 per hour. The median completion time was 21 minutes. Thirteen participants were excluded for failing attention checks designed to assess comprehension of the task instructions. Specifically, after completing 40 and 80 trials, participants were asked to report which response keys they used to indicate pleasant versus unpleasant judgments and influenced versus not influenced responses. Failure to correctly identify these mappings suggested inattention or misunderstanding of the task, rendering the corresponding data unreliable.

The final analytic sample therefore consisted of 25 participants. Although this sample size fell below the target determined by the power analysis, data collection was constrained by course-related time limits (PSYC 251) and a technical issue on Prolific. Analyses were conducted using this final sample. Participants had a mean age of 35.76 years (SD = 12.62). Most participants reported citizenship in South Africa, and 60% identified as female.

Procedure and measures

Familiarization with primes and targets

Following logic and deisgn of the original study, I included table featuring random selcetion of subset of 6 images primes (3 pleasant and 3 unpleasant) and six taregts. The reasoning behind that was in the original study, the authors noticed that some participants in experiments 1-4 were confused about the difference between target images and noise mask, they decided to include

AMP

The present study employed a modified version of the Affect Misattribution Procedure (AMP; Payne et al., 2005) that diverged from the original design by including both a first-person AMP and a third-party AMP. The first-person AMP closely followed standard AMP procedures, whereas the third-party AMP constituted a key deviation, allowing participants to judge the influence of primes on another individual’s responses.

In the first-person AMP, participants completed a series of trials in which a highly positive or highly negative prime was briefly presented, followed by an ambiguous abstract painting. Primes were displayed for 100 ms, targets for 200 ms, and each target was followed by a visual noise mask for 100 ms. Participants were instructed to judge whether the abstract painting was more or less visually pleasant than average by pressing the I key (pleasant) or the E key (unpleasant). They were explicitly instructed to ignore the prime and to avoid letting it bias their judgment of the abstract painting. Despite these instructions, participants were informed that the preceding real-life image might sometimes influence their judgments. Unlike the original AMP task, immediately after each first-person AMP trial, participants reported whether they believed their judgment of the abstract painting had been influenced by the prime. Specifically, they were instructed to press the 1 key if they believed their response was influenced by the preceding image and the 0 key if they believed it was not influenced. This trial-by-trial influence judgment served as the measure of self-reported influence awareness.

Following completion of the first-person AMP, participants completed a third-party AMP. In this task, participants observed the same prime–target pairings as a past participant, whose responses were yoked from a randomly selected participant in Study 3 of Kurdi et al. (2024). Rather than evaluating the target themselves, participants were shown the past participant’s response on each trial (e.g., “The participant’s response on this trial was: Pleasant / Unpleasant”). Participants were then asked to judge whether they believed the past participant’s response had been influenced by the prime, using the same response keys as in the first-person AMP (1 = influenced, 0 = not influenced).

The task consisted of 80 trials in total. The first 10 trials served as practice trials and were excluded from analysis. The subsequent 34 trials (17 positive and 17 negative primes) comprised the first-person AMP. The final 36 trials (18 positive and 18 negative primes) comprised the third-party AMP. Each abstract painting was presented only once throughout the task.

Analysis

Note: AMP scores (Pdiff) represent how much more likely participants were to say “pleasant” after a pleasant prime versus an unpleasant prime. Awareness scores represent the proportion of trials where participants reported being “influenced” by the prime.

data <- data %>%
  mutate(
    # Extract citizenship (case insensitive)
    citizenship = str_extract(demographics, '(?i)"citizenship":"[^"]+') %>%
      str_replace('(?i)"citizenship":"', '') %>%
      str_to_title(),
    
    # Extract gender (case insensitive)
    gender = str_extract(demographics, '(?i)"gender":"[^"]+') %>%
      str_replace('(?i)"gender":"', '') %>%
      str_to_title(),
    
    # Standardize gender categories
    gender = case_when(
      gender %in% c("F", "Female", "Female ") ~ "Female",
      gender %in% c("M", "Male", "Man") ~ "Male",
      TRUE ~ gender
    ),
    
    # Extract age (case insensitive)
    age = str_extract(demographics, '(?i)"age":"[^"]+') %>%
      str_replace('(?i)"age":"', '') %>%
      as.numeric()
  )

# Calculate demographics statistics
age_mean <- round(mean(data$age, na.rm = TRUE), 2)
age_sd <- round(sd(data$age, na.rm = TRUE), 2)
n_total <- nrow(data)
n_female <- sum(data$gender == "Female", na.rm = TRUE)
pct_female <- round((n_female / n_total) * 100, 1)

# Most common citizenship
citizenship_mode <- names(sort(table(data$citizenship), decreasing = TRUE))[1]


demographics_table <- data.frame(
  Characteristic = c("Sample Size", "Age (M ± SD)", "Gender (% Female)", "Most Common Citizenship"),
  Value = c(
    n_total,
    paste0(age_mean, " ± ", age_sd, " years"),
    paste0(pct_female, "%"),
    citizenship_mode
  )
)

kable(demographics_table, col.names = c("", ""), caption = "Demographics Summary")

Demographics Summary
Sample Size	25
Age (M ± SD)	35.76 ± 12.62 years
Gender (% Female)	60%
Most Common Citizenship	South Africa

#### Computing AMP Scores and Awareness Indices
# Calculate participant-level scores
participant_scores <- data %>%
  rowwise() %>%
  mutate(
    # First-person AMP: pleasant primes
    fp_pleasant_trials = list(which(c_across(paste0("first_person_prime_", 1:34)) == "pleasant")),
    fp_pleasant_pleasant = sum(c_across(paste0("first_person_choice_", 1:34))[fp_pleasant_trials[[1]]] == "pleasant", na.rm = TRUE),
    fp_pleasant_total = length(fp_pleasant_trials[[1]]),
    
    # First-person AMP: unpleasant primes
    fp_unpleasant_trials = list(which(c_across(paste0("first_person_prime_", 1:34)) == "unpleasant")),
    fp_unpleasant_pleasant = sum(c_across(paste0("first_person_choice_", 1:34))[fp_unpleasant_trials[[1]]] == "pleasant", na.rm = TRUE),
    fp_unpleasant_total = length(fp_unpleasant_trials[[1]]),
    
    # First-person AMP score (Pdiff)
    ampDiffSelf = (fp_pleasant_pleasant / fp_pleasant_total) - (fp_unpleasant_pleasant / fp_unpleasant_total),
    
    # First-person awareness
    fp_aware = sum(c_across(paste0("first_person_influenced_", 1:34)) == "influenced", na.rm = TRUE),
    fp_total = sum(!is.na(c_across(paste0("first_person_influenced_", 1:34)))),
    shareAwareSelf = fp_aware / fp_total,
    
    # Third-person AMP: pleasant primes
    tp_pleasant_trials = list(which(c_across(paste0("third_person_prime_", 1:36)) == "pleasant")),
    tp_pleasant_pleasant = sum(c_across(paste0("third_person_choice_", 1:36))[tp_pleasant_trials[[1]]] == "pleasant", na.rm = TRUE),
    tp_pleasant_total = length(tp_pleasant_trials[[1]]),
    
    # Third-person AMP: unpleasant primes
    tp_unpleasant_trials = list(which(c_across(paste0("third_person_prime_", 1:36)) == "unpleasant")),
    tp_unpleasant_pleasant = sum(c_across(paste0("third_person_choice_", 1:36))[tp_unpleasant_trials[[1]]] == "pleasant", na.rm = TRUE),
    tp_unpleasant_total = length(tp_unpleasant_trials[[1]]),
    
    # Third-person AMP score (Pdiff)
    ampDiffThird = (tp_pleasant_pleasant / tp_pleasant_total) - (tp_unpleasant_pleasant / tp_unpleasant_total),
    
    # Third-person awareness
    tp_aware = sum(c_across(paste0("third_person_influenced_", 1:36)) == "influenced", na.rm = TRUE),
    tp_total = sum(!is.na(c_across(paste0("third_person_influenced_", 1:36)))),
    shareAwareThird = tp_aware / tp_total
  ) %>%
  ungroup() %>%
  select(participant_id, ampDiffSelf, shareAwareSelf, ampDiffThird, shareAwareThird)

# Store summary statistics
fp_amp_mean <- round(mean(participant_scores$ampDiffSelf, na.rm = TRUE), 3)
fp_amp_median <- round(median(participant_scores$ampDiffSelf, na.rm = TRUE), 3)
fp_amp_sd <- round(sd(participant_scores$ampDiffSelf, na.rm = TRUE), 3)

fp_aware_mean <- round(mean(participant_scores$shareAwareSelf, na.rm = TRUE), 3)
fp_aware_median <- round(median(participant_scores$shareAwareSelf, na.rm = TRUE), 3)
fp_aware_sd <- round(sd(participant_scores$shareAwareSelf, na.rm = TRUE), 3)

tp_amp_mean <- round(mean(participant_scores$ampDiffThird, na.rm = TRUE), 3)
tp_amp_median <- round(median(participant_scores$ampDiffThird, na.rm = TRUE), 3)
tp_amp_sd <- round(sd(participant_scores$ampDiffThird, na.rm = TRUE), 3)

tp_aware_mean <- round(mean(participant_scores$shareAwareThird, na.rm = TRUE), 3)
tp_aware_median <- round(median(participant_scores$shareAwareThird, na.rm = TRUE), 3)
tp_aware_sd <- round(sd(participant_scores$shareAwareThird, na.rm = TRUE), 3)

par(mfrow = c(2, 2))

Descriptive Statistics:

First-person AMP:

AMP Score (Pdiff): M = 0.52, Mdn = 1, SD = 0.586
Awareness P(aware): M = 0.645, Mdn = 0.618, SD = 0.303
Original study: AMP M = 0.350, Awareness M = 0.477

Third-party AMP:

AMP Score (Pdiff): M = 0.32, Mdn = 0, SD = 0.627
Awareness P(aware): M = 0.503, Mdn = r tp_aware_median, SD = 0.166
Original study: AMP M = 0.348, Awareness M = 0.566

# Key Test: Correlation Between First-Person and Third-Party Awareness
awareness_cor <- cor.test(participant_scores$shareAwareSelf, 
                          participant_scores$shareAwareThird)

# Correlation between AMP scores
amp_cor <- cor.test(participant_scores$ampDiffSelf, 
                    participant_scores$ampDiffThird)

The correlation between first-person and third-party awareness was r = 0.306, 95% CI [-0.102, 0.625], t(23) = 1.54, p = 0.137.
The correlation between first-person and third-party AMP scores was r = -0.018, p = 0.931.

# Reshape data from wide to long format
data_long <- data %>%
  select(participant_id, starts_with("first_person"), starts_with("third_person")) %>%
  pivot_longer(
    cols = -participant_id,
    names_to = "variable",
    values_to = "value"
  ) %>%
  extract(variable, 
          into = c("perspective", "measure", "trial_num"), 
          regex = "(first_person|third_person)_(prime|choice|influenced)_(\\d+)",
          remove = FALSE) %>%
  pivot_wider(
    names_from = measure,
    values_from = value,
    id_cols = c(participant_id, perspective, trial_num)
  )

# Calculate P(aware) per participant for each condition
participant_aware <- data_long %>%
  mutate(
    perspective_label = ifelse(perspective == "first_person", "First-person AMP", "Third-party AMP"),
    prime_type = ifelse(prime == "pleasant", "Positive prime", "Negative prime"),
    response_type = case_when(
      choice == "pleasant" ~ "Pleasant",
      choice == "unpleasant" ~ "Unpleasant",
      TRUE ~ NA_character_
    ),
    aware = ifelse(influenced == "influenced", 1, 0)
  ) %>%
  filter(!is.na(response_type)) %>%
  group_by(participant_id, perspective_label, prime_type, response_type) %>%
  summarise(
    prop_aware = mean(aware, na.rm = TRUE),
    n_trials = n(),
    .groups = "drop"
  ) %>%
  mutate(
    condition = paste(perspective_label, prime_type, response_type, sep = "\n"),
    condition = factor(condition, levels = c(
      "First-person AMP\nPositive prime\nPleasant",
      "First-person AMP\nPositive prime\nUnpleasant",
      "First-person AMP\nNegative prime\nPleasant",
      "First-person AMP\nNegative prime\nUnpleasant",
      "Third-party AMP\nPositive prime\nPleasant",
      "Third-party AMP\nPositive prime\nUnpleasant",
      "Third-party AMP\nNegative prime\nPleasant",
      "Third-party AMP\nNegative prime\nUnpleasant"
    ))
  )

# Create the plot
ggplot(participant_aware, aes(x = condition, y = prop_aware, fill = response_type)) +
  geom_violin(alpha = 0.3, width = 0.7) +
  geom_jitter(width = 0.1, alpha = 0.5, size = 2) +
  stat_summary(fun = mean, geom = "point", size = 4, color = "black", shape = 18) +
  stat_summary(fun = mean, geom = "crossbar", width = 0.5, color = "black", size = 0.3) +
  geom_hline(yintercept = 0.5, linetype = "dashed", color = "gray50") +
  labs(
    x = "",
    y = "P(aware)",
    title = "Replication Results"
  ) +
  theme_minimal() +
  theme(
    legend.position = "none",
    axis.text.x = element_text(angle = 45, hjust = 1, size = 9),
    panel.grid.major.x = element_blank(),
    panel.border = element_rect(color = "black", fill = NA, size = 1)
  ) +
  scale_y_continuous(limits = c(0, 1), breaks = seq(0, 1, 0.2)) +
  scale_fill_manual(values = c("Pleasant" = "pink", "Unpleasant" = "purple"))

knitr::include_graphics("/Users/milka/Documents/GitHub/psyc 251/waniak2025/analysis/originalgraph.png")

```

✅ Replicated findings

The awareness effect emerged on both first-person and third-person AMPs
- Response × Prime interaction was significant, p = .001
- People were more aware when their response contradicted the prime, especially for negative primes in third-person condition (p = .003)
AMP effects were present in both conditions (Pdiff > 0)
- First-person AMP: M = 0.52, Mdn = 1, SD = 0.586
- Third-party AMP: M = 0.32, Mdn = 0, SD = 0.627
- Original study: First-person M = 0.350, Third-party M = 0.348

❌ Non-replicated findings

No correlation between first-person and third-person awareness
- r = 0.306, 95% CI [-0.102, 0.625], t(23) = 1.54, p = 0.137
- Original study: r = .487, p < .001
We did NOT find significantly higher awareness in third-party vs. first-person judgments
- AMP target effect (first-person vs. third-person): p = .56 (not significant)
- First-person awareness: M = 0.645, Mdn = 0.618, SD = 0.303
- Third-party awareness: M = 0.503, Mdn = 0.5, SD = 0.166
- Original study: First-person M = 0.477, Third-party M = 0.566 (significantly different)
- Current study: The difference was negligible and non-significant

Discussion

Overall, the present study provides partial replication of the original findings, and failed to provide clear support for either the introspective access or inferential awareness accounts. Several core qualitative patterns replicated, supporting the robustness of the Affect Misattribution Procedure (AMP). However, key inferential predictions did not fully replicate, particularly the relationship between first-person and third-person awareness.

Consistent with the original study, AMP effects emerged reliably in both first-person and third-person conditions, demonstrating that the task consistently produced affect misattribution. Participants’ awareness reports were not random: awareness varied systematically with response–prime congruency, and this pattern appeared in both self- and other-judgments. In particular, participants were more likely to report awareness when responses contradicted the prime, and this effect was especially pronounced for negative primes in the third-person condition. These replicated patterns suggest that awareness judgments are shaped by structured cues in the task and align with the idea that participants may rely on post-hoc reasoning about their responses.

At the same time, two central findings did not replicate. First, we did not observe a reliable association between individuals’ first-person and third-person awareness. This correlation was a key test of the inferential account, as the original study interpreted it as evidence that awareness judgments reflect general reasoning strategies rather than privileged introspective access. Second, we did not find higher awareness in the third-person condition compared to the first-person condition; instead, awareness levels were similar across perspectives. Although this contrast was not the primary theoretical focus, it further weakens the claim that third-person judgments are systematically easier or more inference-based than first-person judgments.

Several factors may explain why these effects failed to replicate. Most importantly, the present study was substantially underpowered relative to the original, which included more than ten times as many participants. I fell short of the target sample size for 80% power by five participants. Even if the desired sample size of 30 had been reached, this would still be substantially smaller than the original study’s sample of 292 participants. The smaller sample size limits sensitivity to individual-difference effects such as correlations and may have obscured relationships that are present but modest in magnitude. Thus, the absence of a first–third person correlation should not be taken as strong evidence against the inferential hypothesis.

In addition, there were meaningful methodological and sample differences between the two studies. The present sample was recruited through Prolific and compensated financially, whereas the original study used unpaid participants from Project Implicit. Compensation may influence engagement, task strategies, or how carefully participants reflect on their responses. The inclusion of an attention check in the current study, which was absent in the original, may also have altered the composition of the final sample by excluding participants who would otherwise have been retained. Finally, the samples differed substantially in cultural background, with the original study drawing primarily from U.S. participants and the current study including a majority of participants from South Africa. Cultural differences in self-reflection, social inference, or reporting of mental states could plausibly affect awareness judgments, particularly in the third-person condition.

In sum, the results support the robustness of AMP effects, consistent with prior work showing that affect misattribution occurs reliably even when participants attempt to discount the prime. However, the present findings provide weaker support for the inferential account than the original study, primarily due to the non-replication of the correlation between first- and third-person awareness. Given the limited power and methodological differences, these results are best interpreted as inconclusive rather than contradictory, highlighting the need for larger and more diverse samples to clarify whether reported influence awareness in the AMP reflects post-hoc inference, introspective access, or a combination of both.