Replication of Study by Young & Saxe. (2011, Cognition)

Author

Cassie Wang (tiw037@ucsd.edu)

Published

December 6, 2024

Experimental paradigam:

https://ucsd-psych201a.github.io/young2011/

Introduction

Moral judgments are often shaped by the perceived intentions behind actions. Prior research, including a study by Young and Saxe (2011) (https://doi.org/10.1016/j.cognition.2011.04.005), suggests that the importance of intent varies across different moral domains. For instance, harmful actions tend to be judged more heavily based on the actor’s intentions, while purity violations—such as incest or consuming taboo substances—are often viewed as wrong regardless of intent. This difference aligns with theories of moral foundations, which associate purity violations with strong disgust reactions and harm violations with greater attention to the actor’s mental state.

The key question in the original study was whether intent plays a greater role in moral judgments of harm compared to purity. The results revealed a significant interaction between moral domain and intent: harm judgments were highly sensitive to intent, while purity violations (involving incest and ingestion) were less so. For example, accidental harm was judged as less wrong than accidental purity violations, whereas intentional harm was seen as more wrong than intentional purity violations.

In this replication study, we revisited the role of intent in moral judgments across three domains: harm, incest, and ingestion. We closely followed the original study’s design, using second-person scenarios and a 7-point moral wrongness scale. Our goal was to test the robustness of the original findings in a new sample and recruitment platform while exploring whether cultural or methodological differences might influence the results.

Design Oveview

We used a 2 (intent: intentional vs. accidental) × 3 (domain: harm, incest, ingestion) between-subjects design. Participants were randomly assigned to read one scenario that varied by both intent and domain. After reading the scenario, they rated the moral wrongness of the action on a 7-point scale, ranging from “not at all morally wrong” to “very morally wrong.” Our study closely followed the original design, with the primary difference being the use of Prolific for participant recruitment instead of Amazon MTurk.

To preserve the integrity of the study, we maintained the original between-subjects design. A within-subjects design could have led participants to guess the study’s purpose, potentially influencing their responses. The experiment was conducted double-blind. Although factors such as cultural background, religious beliefs, education level, and gender could influence moral judgments, these variables were not explicitly manipulated in our study.

Methods

Power Analysis

Based on a conventional medium-large effect size (Cohen’s d ~ 0.8) and aiming for 80% power at α = .05, a sample size of approximately 351 participants was deemed sufficient. Feasibility considerations led us to target a similar sample size. Our final analyzed sample included 343 participants, closely matching the power requirements.

Planned Sample

We recruited English-speaking adults living in the United States through Prolific. We included participants aged 18-100, aiming for a diverse demographic. Given our power analysis, we planned to collect around 351 participants; the final sample included 343 usable responses. Participants who indicated that they had completed a similar task before were excluded to prevent familiarity biases.

Materials

We used the same moral judgment scenarios as in the original study, adapted in the second-person perspective. Scenarios fell into three domains: - Harm (e.g., poisoning or unknowingly causing an allergic reaction) - Incest (e.g., a sexual relationship between siblings) - Ingestion (e.g., consuming taboo substances, like dog meat or urine) Each scenario had intentional and accidental versions. For instance, in an accidental harm scenario, a participant (you) inadvertently causes harm due to ignorance of a critical detail (e.g., peanut allergy).

Procedure

Participants were randomly assigned to one of the six conditions (2 intent levels × 3 domains) and presented with a single scenario. After reading it, they provided a moral wrongness rating on a scale from 1 (“not at all morally wrong”) to 7 (“very morally wrong”). The survey was administered online, and participants were compensated for their time.

Analysis Plan

We planned to conduct a series of ANOVAs to examine the intent × domain interaction. First, we would test whether different story exemplars within the same domain differed significantly. If no differences emerged, we would collapse across exemplars. Then, we would conduct 2 (intent) × 2 (domain) comparisons to replicate the original analyses, focusing on the role of intent in harm vs. purity (incest, ingestion) judgments.

We expected to replicate the original findings: intent would have a stronger effect on harm judgments compared to purity judgments. Specifically, accidental harm should be judged less harshly than accidental purity violations, while intentional harm should be judged more harshly than intentional purity violations.

Differences from Original Study

The original study recruited participants via Amazon Mechanical Turk, while we used Prolific. Although both platforms host English-speaking U.S. participants, minor demographic differences might exist. We adhered to similar exclusion criteria and closely matched the original methods otherwise. We believe these minor methodological shifts are unlikely to significantly alter the core patterns of results.

Methods Addendum (Post Data Collection)

Actual Sample

We collected 343 participants, all English-speaking adults residing in the United States. The sample size was slightly below the targeted 351 but still provided sufficient power.

Differences from pre-data collection methods plan

No major deviations from our preregistered plan occurred.

Results

Data preparation

Data preparation following the analysis plan.

AOV comparing harm vs. incest

                  Df Sum Sq Mean Sq F value   Pr(>F)    
domain             1   10.2    10.2   2.371 0.125055    
intention          1  394.1   394.1  91.238  < 2e-16 ***
domain:intention   1   54.6    54.6  12.648 0.000462 ***
Residuals        216  933.1     4.3                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


AOV comparing harm vs. ingestion

                  Df Sum Sq Mean Sq F value   Pr(>F)    
domain             1    4.5    4.50   1.105    0.294    
intention          1  230.5  230.48  56.638 8.92e-13 ***
domain:intention   1  169.0  169.04  41.539 5.70e-10 ***
Residuals        256 1041.7    4.07                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


AOV comparing incest vs. ingestion

                  Df Sum Sq Mean Sq F value   Pr(>F)    
domain             1   26.8   26.81   5.983 0.015225 *  
intention          1   59.5   59.46  13.269 0.000336 ***
domain:intention   1   17.7   17.71   3.953 0.048022 *  
Residuals        222  994.8    4.48                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Mean Ratings and Standard Errors:

# A tibble: 6 × 4
  domain    intention   mean_rating se_rating
  <fct>     <fct>             <dbl>     <dbl>
1 harm      accidental        0.481     0.124
2 harm      intentional       4.74      0.220
3 incest    accidental        2         0.413
4 incest    intentional       3.88      0.273
5 ingestion accidental        3.78      0.265
6 ingestion intentional       4.42      0.237

Confirmatory analysis

The original study by Young and Saxe (2011) found that:

There were no significant differences between stories within each domain. Each of the three 2 (story) × 2 (intent) ANOVAs (for harm, incest, and ingestion) revealed main effects of intent but not story, and no story × intent interaction. This allowed the original authors to collapse across stories within each domain in subsequent analyses.
When comparing harm to incest and harm to ingestion using 2 (intent) × 2 (domain) ANOVAs, the original results showed a significant intent × domain interaction. This indicated that intent mattered more for harm judgments than for purity (incest, ingestion). Specifically, accidental purity violations were judged more morally wrong than accidental harm, and intentional harm was judged more morally wrong than intentional purity violations, aligning with the predicted difference in how intent influences moral judgments of harm versus purity.
However, when comparing the two purity domains (incest vs. ingestion), the original study found no intent × domain interaction. Both forms of purity violations were judged similarly in terms of intent-sensitivity, suggesting that purity as a category was less intent-dependent than harm.

In contrast, our replication found:

Similar to the original, we found main effects of intent in all comparisons, indicating that intentional violations are judged more harshly than accidental ones in general.
In comparisons of harm versus incest and harm versus ingestion, we also replicated the pattern of significant domain × intent interactions. This suggests that, as originally reported, harm judgments remain more intent-sensitive than purity judgments.
Critically, our replication diverged from the original findings when comparing incest versus ingestion. While the original study reported no domain × intent interaction for the two purity domains, our results indicated a significant interaction (F(1, 222) = 3.95, p = 0.048). In other words, unlike the original study, we found that the two purity domains (incest and ingestion) were not equally insensitive to intent; instead, we observed subtle differences in how intent influenced judgments within these purity scenarios.

In terms of mean ratings, our results also showed a somewhat different pattern than the original. The original study highlighted that accidental purity violations (incest and ingestion) were judged more harshly than accidental harm, while intentional harm was judged more harshly than purity violations. Our mean ratings suggest a similar pattern for harm vs. incest and harm vs. ingestion (with accidental purity > accidental harm and intentional harm > intentional purity). However, the difference between the two purity domains themselves was more pronounced in our data: accidental incest (M = 2.00) and accidental ingestion (M = 3.78) differed notably, indicating that not all purity violations are treated identically with respect to accidental intent.

Discussion

Summary of Replication Attempt

Our replication partially diverged from the original findings. While we replicated the general notion that intent matters for moral judgment, we did not confirm the original pattern of intent playing a significantly larger role only for harm. Instead, our results suggest that differences in how intent affects judgments extended to within the purity domain comparisons as well.

Commentary

Several factors could explain this discrepancy. Slight differences in participant demographics, cultural shifts since the original study, or the use of a different recruitment platform (Prolific vs. MTurk) may have influenced how participants evaluated purity-related scenarios. Also, given the passing of time and changes in cultural norms, what was once seen as uniformly taboo (and thus less sensitive to intent) may now be interpreted with more nuance.

The expanded effect of intent in purity domains could indicate that the originally reported pattern was not as stable or universal as presumed. Alternatively, our larger sample (343 participants) may have provided more statistical power, detecting subtle differences that the original study’s sample missed. Another possibility is that exposure to more varied moral content online has sensitized participants to consider mental states even in purity violations.

In future research, including measures of disgust sensitivity, moral foundations, or other individual difference variables may help explain why purity domains may now be showing increased intent sensitivity. Further replications could clarify whether these shifts are due to methodological differences, cultural change over time, or random sampling variation.

In conclusion, although we followed the original methodology closely, our replication did not yield identical results. This outcome highlights the importance of replication in moral psychology research and the necessity of considering cultural and temporal contexts when interpreting moral judgments.

Contribution

Conceptualization: Coxi Jiang, Belynda Herrera, Seyi Lawal, and Cassie Wang. Data curation: Coxi Jiang, Belynda Herrera, Seyi Lawal, and Cassie Wang. Formal analysis: Coxi Jiang and Seyi Lawal. Funding acquisition: Coxi Jiang, Belynda Herrera, Seyi Lawal, and Cassie Wang. Investigation: Coxi Jiang, Belynda Herrera, Seyi Lawal, and Cassie Wang. Methodology: Coxi Jiang, Belynda Herrera, Seyi Lawal, and Cassie Wang. Software: Coxi Jiang and Seyi Lawal. Visualization: Coxi Jiang and Seyi Lawal. Writing - original draft: Coxi Jiang, Belynda Herrera, Seyi Lawal, and Cassie Wang. Writing - review & editing: Coxi Jiang, Belynda Herrera, Seyi Lawal, and Cassie Wang.