Replication of Study by Young & Saxe. (2011, Cognition)

Author

Cassie Wang (tiw037@ucsd.edu)

Published

November 25, 2024

Experimental paradigam:

Link: https://ucsd-psych201a.github.io/young2011/

Introduction

I chose to replicate study by Young and Saxe (https://doi.org/10.1016/j.cognition.2011.04.005) because it provides a fundamental exploration of how intent influences moral judgments across distinct moral domains, specifically harm (e.g., assault) and purity violations (e.g., incest). This experiment highlights that while intent is a crucial factor in moral judgments of harm, it is less influential in purity violations. This result shows different moral evaluations of actions based on intent, offering insight into human decision-making in ethically complicated situations. It aligns with my interests in understanding decision-making process. The study offers a way to explore whether moral judgments are universal or context-specific, making it an excellent candidate for replication.

In my replication study, I will conduct an online survey where participants will read scenarios involving both harm and purity violations, presented in the second-person perspective like Experiment 1A from the original study. Scenarios will be depicted as either intentionally or accidentally. Participants will rate the moral wrongness of each action on a 7-point scale, from “not at all morally wrong” to “very morally wrong.” Challenges in replicating this study may include recruiting a group of participants that reflects the original demographic across online platforms.

Design Oveview

2 factors are manipulated in the experiments (intentional vs. accidental) (harm vs purity). 1 measure of moral wrongness are taken and repeated after each scenario is prestented. It used between participants design. If changed to within participants design, participants might be able to realize the purpose of the research, hence influence their responses to the study. The experiment will be double blind to reduce demand characteristics. Some of the confounds variables could be educational level or gender.

Methods

Power Analysis

Original effect size, power analysis for samples to achieve 80%, 90%, 95% power to detect that effect size. Considerations of feasibility for selecting planned sample size.

Planned Sample

We plan to recruit participants with an age rang (18-68) through Prolific. The participants have to be fluent in English and live in the United States. We aimed to collect about 200 useable responses. We will verify that participants have not completed a similar task before to avoid familiarity biases.

Materials

We will use the exact moral judgment scenarios in second-person perspectives used in the original study, including actions representing harm and purity violations with varied intentions (intentional vs. accidental).

Procedure

Participants will be presented with a single randomly assigned moral judgment scenario and asked to rate its moral wrongness on a 7-point scale. This scale ranges from “not at all morally wrong” (1) to “very morally wrong” (7).

Analysis Plan

A 2 tailed t-test will be performed to see how domain (harm vs. purity) influence on moral wrongness judgments when actions are accidental.

Clarify key analysis of interest here You can also pre-specify additional analyses you plan to do.

Differences from Original Study

First, the original study recruited participants from Amazon Mechanical Turk (MTurk). We might use Prolific to achieve a broader demographics. We will maintain the same recruitment criteria to minimize the influence that can be caused by differences in demographics and believe that this minor differences will not significantly impact our ability to replicate the original findings.

Methods Addendum (Post Data Collection)

You can comment this section out prior to final report with data collection.

Actual Sample

We collect 5 participants in our pilot study for the demographics of English-speaking adult living in United States.

Differences from pre-data collection methods plan

None

Results

Data preparation

Data preparation following the analysis plan.

# Load necessary libraries
library(ggplot2)
library(jsonlite)

# Define the directory path
file_path <- "/Users/tw/Documents/GitHub/young/data/PilotB"

# List all CSV files in the directory
files <- list.files(file_path, pattern = "\\.csv$", full.names = TRUE)

# Step 1: Read and Combine Data, Excluding Specific Responses
data_list <- lapply(files, function(file) {
  data <- read.csv(file)
  # Exclude rows where response contains {"TakenSurveyBefore":"Yes"}
  data <- data[!grepl('"TakenSurveyBefore":"Yes"', data$response), ]
  return(data)
})
data <- do.call(rbind, data_list)

# Step 2: Filter out the introductory response
data <- data[data$stimulus != "Welcome to the experiment. Press any key to begin.", ]

# Step 3: Extract rows with scenarios (non-null 'stimulus')
scenarios <- data[!is.na(data$stimulus), c("trial_index", "stimulus")]

# Step 4: Categorization function based on provided keywords
categorize_scenario <- function(text) {
  text <- tolower(text)
  if (grepl("biological parent", text, ignore.case = TRUE)) {
    return("purity")
  } else if (grepl("peanuts", text, ignore.case = TRUE)) {
    return("harm")
  } else if (grepl("urine", text, ignore.case = TRUE)) {
    return("purity")
  } else if (grepl("dog meat", text, ignore.case = TRUE)) {
    return("purity")
  } else if (grepl("siblings", text, ignore.case = TRUE)) {
    return("purity")
  } else if (grepl("poison", text, ignore.case = TRUE)) {
    return("harm")
  }
  return("other")
}

# Apply categorization
scenarios$category <- sapply(scenarios$stimulus, categorize_scenario)

# Step 5: Extract rows with ratings (trial_type = 'survey-likert')
ratings <- data[data$trial_type == "survey-likert" & !is.na(data$response), c("trial_index", "response", "time_elapsed")]

# Match scenarios with the closest ratings by time_elapsed
match_closest_rating <- function(scenario_row, ratings) {
  time_diff <- abs(ratings$time_elapsed - scenario_row["time_elapsed"])
  closest_index <- which.min(time_diff)
  return(ratings[closest_index, ])
}

scenarios$time_elapsed <- data[data$trial_index %in% scenarios$trial_index, "time_elapsed"]
scenarios <- merge(scenarios, ratings, by = NULL)

# Extract numeric ratings
extract_rating <- function(response) {
  parsed_response <- fromJSON(response)
  return(as.numeric(parsed_response$Q0))
}

scenarios$rating <- sapply(scenarios$response, extract_rating)

# Step 6: Filter out irrelevant categories
analyzable_data <- scenarios[scenarios$category %in% c("harm", "purity"), ]

# Step 7: Perform t-test for purity vs harm

# Ensure no missing values in ratings for each category
purity_ratings <- na.omit(analyzable_data$rating[analyzable_data$category == "purity"])
harm_ratings <- na.omit(analyzable_data$rating[analyzable_data$category == "harm"])

# Verify the number of observations in each group
cat("Number of purity ratings:", length(purity_ratings), "\n")

Number of purity ratings: 600

cat("Number of harm ratings:", length(harm_ratings), "\n")

Number of harm ratings: 300

# Perform t-test only if there are enough observations
if (length(purity_ratings) > 1 && length(harm_ratings) > 1) {
  t_test_result <- t.test(purity_ratings, harm_ratings)
  print(t_test_result)
} else {
  cat("Not enough observations in one or both categories to perform a t-test.\n")
}


    Welch Two Sample t-test

data:  purity_ratings and harm_ratings
t = -8.7999e-14, df = 597.67, p-value = 1
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.2775111  0.2775111
sample estimates:
mean of x mean of y 
 3.533333  3.533333

Ensure ggplot2 is loaded

library(ggplot2)

Example plot

p <- ggplot(mean_ratings, aes(x = category, y = mean_rating, fill = category)) + geom_bar(stat = “identity”, color = “black”, width = 0.6) + geom_errorbar(aes(ymin = mean_rating - se_rating, ymax = mean_rating + se_rating), width = 0.2) + labs( title = “Mean Moral Wrongness Ratings for Purity vs Harm”, x = “Category”, y = “Mean Rating (with SE)” ) + theme_minimal() + scale_fill_manual(values = c(“harm” = “red”, “purity” = “blue”))

Print the plot to ensure it renders

print(p) ```

Confirmatory analysis

We will conduct a 2 tailed t-test on domain (harm vs. purity) on moral wrongness ratings. We expect the results to match the result of the original study that illustrated an interaction between intent and domain, indicating that intent plays a more significant role in harm than purity judgments.

Side-by-side graph with original graph is ideal here

Exploratory analyses

Any follow-up analyses desired (not required).

Discussion

Summary of Replication Attempt

Open the discussion section with a paragraph summarizing the primary result from the confirmatory analysis and the assessment of whether it replicated, partially replicated, or failed to replicate the original result.

Commentary

Add open-ended commentary (if any) reflecting (a) insights from follow-up exploratory analysis, (b) assessment of the meaning of the replication (or not) - e.g., for a failure to replicate, are the differences between original and present study ones that definitely, plausibly, or are unlikely to have been moderators of the result, and (c) discussion of any objections or challenges raised by the current and original authors about the replication attempt. None of these need to be long.