Replication of How Quick Decisions Illuminate Moral Character by Critcher et al (2013, Social Psychological and Personality Science)

Original Authors: Clayton R. Critcher, Yoel Inbar, & David A. Pizarro

Author

Harley Clifton, Sara Hamidi, Prosperity Land, & Isabella Mullen

Published

December 11, 2024

Introduction

The original study examined how the speed of decision-making influences the perception of moral character. It found that decision speed helps classify whether a choice is moral or immoral depending on the perceived certainty of the decision. Quick decisions perceived as moral led to more positive character evaluations, while quick immoral decisions resulted in harsher judgments. In contrast, slower decisions were viewed as less certain, leading to more moderate evaluations.

Replicating Experiment 1 from “How Quick Decisions Illuminate Moral Character” offers an opportunity to explore how decision speed affects character judgments. In this experiment, participants read a scenario about two characters who find separate cash-filled wallets in a grocery store parking lot. Participants were randomly assigned to either the moral or immoral condition. Those in the moral condition learned that both characters returned the wallets, while those in the immoral condition read that both characters kept the money. One character decided quickly, while the other took longer. Participants then rated each character on a 1-7 scale, based on perceived decision speed, moral principles, decision certainty, and impulsiveness. A two-way ANOVA was used to analyze whether there is an interaction between decision speed and morality conditions on moral character judgments.

\(~\)

Design Overview for Reproducibility

Manipulated Variables

Decision speed (quick or slow): Each participant was informed that one character, Justin, was able to make their decision quickly, and the other character, Nate, was only able to make their decision after “long and careful deliberation” regardless of moral condition.
Moral conditions (immoral or moral): Participants were given the same scenario where Justin and Nate come across a cash-filled wallet. Then they were randomly assigned to one of two conditions: moral or immoral. Based on their condition, they were given a prompt with the respective outcome. In the moral condition, both characters return the wallet, while in the immoral condition, they both keep it.

Measurements

Four categories of measurements were recorded on a 1-7 scale:

Moral character evaluation (3 questions per character, 6 total)
Quickness (1 question per character, 2 total)
Certainty (4 questions per character, 8 total)
Emotional impulsivity (2 questions per character, 4 total)

(See Procedure section for more details on the types of questions in these categories)

Study Design

This study is a mixed design because the moral condition was assigned between-participants, but multiple measures were recorded for each subject, making it also within-participants. Each participant was exposed to only one type of moral outcome, either the characters acted morally or immorally.

Repetition in Measures

The questions regarding moral character evaluations, perceived decision speed, certainty, and emotional impulsivity were presented for each character, once for Justin (who decided quickly) and again for Nate (who decided slowly).

Consequences of Design Alterations

We decided not to alter the study design because modifying to only within-participant design, where participants experience both moral and immoral conditions, may elicit response bias. This is because a within-participant design may negatively influence participant judgments by making it harder to evaluate behaviors in each condition independently. It may also create demand characteristics, where participants guess the study’s purpose and modify their responses as a result. For these reasons, we chose to move forward with the original mixed design.

Reducing Demand Characteristics

Preserving the mixed design and randomly assigning the morality condition minimizes demand characteristics. This approach prevents any influence of prior knowledge on participant’s evaluations or making direct comparisons between moral or immoral conditions. This allows us to synthesize the effect of decision speed on the measurements rather than morality. Random assignment also minimizes further biases and prevents participants from recognizing patterns that could reveal the study’s intent.

\(~\)

Methods

Power Analysis

The original study did not provide sufficient statistical information to conduct a power analysis. Specifically, it did not report the mean difference between decision speeds or the corresponding standard error, nor did it provide the means and standard errors for the interaction between morality and speed. Additionally, the figure for Experiment 1 lacked error bars, making it impossible to visually estimate variability or calculate effect sizes. Without these metrics we were unable to perform a reliable power analysis to determine the required sample size for replication. Therefore, we used an alternative approach by applying the standard procedure of multiplying the original sample size (n = 119) by 2.5, resulting in a desired sample size of N = 289 participants.

Planned Sample

Our planned sample size is 289 participants. (See Methods Addendum section for more information on the official sample size.)

Materials

To replicate this experiment, we used JSPsych to create an online survey. Participants read scenarios involving Justin and Nate, and then rated them on quickness, moral character, decision certainty, and impulsivity on a scale of 1-7. We reached out to Dr. Clayton Critcher, an author of the original study, who kindly provided us with a detailed document outlining the script and protocol used for Experiment 1. Our study used identical language to the original study in the spirit of replication.

Link to experimental paradigm.

To access all materials used in this replication project please access our GitHub repository

Procedure

We mimicked the procedure from the original study. Here are descriptions from the original paper:

“Participants read about both Justin and Nate, two men who each independently came upon two separate cash-filled wallets in the parking lot of a local grocery store. Justin ‘was able to decide quickly’ what to do. Nate ‘was only able to decide after long and careful deliberation.’”
“Participants assigned to the moral condition learned both men ‘did not steal the money but instead left the wallet with customer service.’ Those in the immoral condition learned instead that both men ‘pocketed the money and drove off.’”
“Participants were asked to rate the quickness, moral character evaluation, decision certainty, and emotional impulsivity of the agents on 1-7 scales.”
Quickness: “participants indicated how quickly (vs. slowly) the decision was made”
Moral Character Evaluation: “participants assess the agents’ underlying moral principles and standards…by asking whether the agent: “has entirely good (vs. entirely bad) moral principles,” “has good (vs. bad) moral standards,” and “deep down has the moral principles and knowledge to do the right thing.”
Certainty: “Participants indicated ‘how conflicted [each] felt when making his decision’, ‘how many reservations [each] had’, whether the target ‘was quite certain in his decision’, and ‘how far [each] was from choosing the alternate course of action.’”
Emotional Impulsivity: “Participants indicated to what extent the person remained “calm and emotionally contained” (reverse-scored) and ‘became upset and acted without thinking.’”

Analysis Plan

The primary focus of this study was to evaluate moral character based on decision morality and decision speed; specifically, to determine whether there is an interaction between morality condition and decision speed on moral character judgement.

Once our data was collected, we planned to exclude responses from participants with any non-answers or those who failed our attention check. We also included a sanity check because participants should have rated Justin (who is able to decide quickly) as faster (higher numerical value) than Nate (who only makes his decision after careful consideration). If participants failed to do so, even if they rated them at the same speed, we had reason to question whether they were adequately reading the prompts and thus their data was excluded from the analysis. Any participants who responded the same way to all numerical response questions also gave us reason to question the validity of their responses, and their data was excluded from the analysis as well. With these exclusion criteria, we hoped to ensure that the data was good quality and ready for analysis.

For statistical analysis, our main question of interest involved testing the effect of two factors (speed and decision morality) and their interaction on the response variable (moral character judgement); thus a type III, two-way analysis of variance (ANOVA) test was appropriate. A type III ANOVA tests each main effect and interaction, adjusting for all other factors and interactions in the model.

The results of the type III two-way ANOVA revealed whether a model with the interaction term is better than just a model with all the predictors additively. After the best model was found, the ANOVA assumptions and model diagnostics were assessed and the results were interpreted for ease of understanding.

Differences from Original Study

In our replication, we introduced several features that differed from the original experiment. One difference was the addition of an attention check at the end of the experiment where we asked participants to name one of the characters from the scenarios. This attention check was designed to ensure participants were engaged throughout the study. Another difference was that our participants were not limited to undergraduate students since we were conducting an online study through Prolific. Additionally, our sample size differed from the original study because we could not conduct a power analysis based on the information provided in the article. Instead, we opted for a rule-of-thumb multiplier to determine our required sample size. These differences may have affected the replication by introducing variability not present in the original study.

Methods Addendum (Post Data Collection)

In our class, we were one of three separate groups running replications of this study. Due to budget constraints, there were not enough funds for all three groups to run 289 participants. Therefore, each group was allowed to run 100 participants for their replication study. We combined and analyzed data from the 3 respective studies and kept only participants who passed each groups’ unique attention checks. Then we applied our exclusion criteria to the combined data for analysis. We also separately analyzed our individual group’s results as well (see Appendix).

Differences From Pre-Data Collection Methods Plan

After collecting our group’s portion of the data, we quickly realized there were only 16 participants (out of our sample of 100) who responded to every rating question—so the planned exclusion criteria had to be changed in the hopes of increased power. Instead we only used data from participants that had at least one non-NA response in each unique combination of Quickness, Character Judgments, and Decision Speed conditions. This made it reasonable to still conduct our main analyses of interest on these data, without risking muddled results due to abundant missing comparisons.

Actual Sample

Data Exclusion Criteria: Our exclusion criteria are as follows:

Submissions with all non-answers for at least one unique combination of Quickness, Character Judgments, and Decision Speed conditions.
Participants who failed the attention check.
Participants who incorrectly rated Justin as slower (lower numerical value) than Nate.
Participants who responded the same to all numerical response questions.

Official Sample Size: After cleaning the final combined data from all groups, the resulting sample size was 220.

\(~\)

Results

The following is our analysis and results from the three groups’ combined data. See the Appendix for more detailed analysis of only our group’s data.

Data Preparation

We began the data cleaning process by pivoting our dataset to a long format to expedite data visualization. Next, we created a standardized trial index column and added a participant ID to keep unique submissions identifiable. Then, to aid with analysis, we added a FastorSlow column where Justin questions were coded as fast and Nate questions were coded as slow. Furthermore, we created a measure variable to categorize which question corresponded to which category of measure. Finally, six questions required reverse coding to ensure consistent scale for the certainty and impulsivity measures. The original author had recommended including appropriate reverse coding but the procedure was our own.

# File path
file_path <- "../data/quick_decisions_all_data.csv"

# Load the data
all_data <- read.csv(file_path)

long_data <- all_data %>%
  pivot_longer(
    cols = contains("_Q"),  # Select all columns containing "_Q"
    names_to = "question",  # Keep the entire column name in 'question'
    values_to = "value"  # Column to hold the values
  )

# Add trial_index column
long_data <- all_data %>%
  pivot_longer(
    cols = contains("_Q"),  # Select all columns containing "_Q"
    names_to = "question",  # Keep the full name in 'question'
    values_to = "value"     # Column to hold the values
  ) %>%
  mutate(
    trial_index = case_when(
      question == "Justin_Q1" ~ 1,
      question == "Justin_Q2" ~ 2,
      question == "Justin_Q3" ~ 3,
      question == "Justin_Q4" ~ 4,
      question == "Justin_Q5" ~ 5,
      question == "Justin_Q6" ~ 6,
      question == "Justin_Q7" ~ 7,
      question == "Justin_Q8" ~ 8,
      question == "Justin_Q9" ~ 9,
      question == "Justin_Q10" ~ 10,
      question == "Nate_Q1" ~ 11,
      question == "Nate_Q2" ~ 12,
      question == "Nate_Q3" ~ 13,
      question == "Nate_Q4" ~ 14,
      question == "Nate_Q5" ~ 15,
      question == "Nate_Q6" ~ 16,
      question == "Nate_Q7" ~ 17,
      question == "Nate_Q8" ~ 18,
      question == "Nate_Q9" ~ 19,
      question == "Nate_Q10" ~ 20,
      TRUE ~ NA_real_  # Handle unexpected cases
    )
  )

# Add a unique ID for each person
long_data <- long_data %>%
  mutate(ID = paste0(ceiling(row_number() / 20)))  # Assign IDs by group of 20 rows

# Add FastorSlow column
long_data <- long_data %>%
  mutate(FastorSlow = ifelse(trial_index <= 10, "f", "s"))  # Assign 'f' for 1-10 and 's' for 11-20

# Add a new column 'measure' based on the 'trial_index' values
long_data <- long_data %>%
  mutate(measure = case_when(
    trial_index %in% c(5,6,7,8,15,16,17,18) ~ "certainty",
    trial_index %in% c(2, 3, 4, 12, 13, 14) ~ "character",
    trial_index %in% c(9,10,19,20) ~ "impulsivity",
    trial_index %in% c(1, 11) ~ "quickness",
    TRUE ~ NA_character_
  ))

long_data <- long_data %>%
  mutate(
    response_reverse = case_when(
      is.na(value) ~ NA_character_,  # Retain NA if value is NA
      trial_index %in% c(6, 7, 10, 16, 17, 20) & !is.na(as.numeric(value)) ~ as.character(8 - as.numeric(value)),  # Reverse coding for numeric responses
      TRUE ~ as.character(value)  # Ensure the original value is cast to character
    )
  )

# Save the combined and cleaned dataset
write.csv(long_data, "../data/quick_decisions_all_clean_data.csv", row.names = FALSE)

After cleaning, we wrangled the data to prepare it for analysis. We then removed observations following our exclusion criteria.

#### Import data
cdf <- read_csv("../data/quick_decisions_all_clean_data.csv")

Rows: 5820 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): condition, question, FastorSlow, measure
dbl (5): group, value, trial_index, ID, response_reverse

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

colfactor <- c("condition", "ID", "FastorSlow", "measure")
cdf[colfactor] <- lapply(cdf[colfactor], as.factor)

As previously mentioned, we had to adjust our initial data exclusion plan. As a result, we only used submissions from participants who had at least one non-NA response for each unique combination of character judgement and quickness conditions for both the fast and slow categories.

uniqueIDs <- cdf %>%
  # Focus only on the relevant combinations of measure and FastorSlow
  filter(measure %in% c("quickness", "character") & FastorSlow %in% c("f", "s")) %>%
  group_by(ID) %>%
  # Check if each ID has at least one non-NA response for all combinations
  filter(all(!is.na(value[measure == "quickness" & FastorSlow == "f"])) &
         all(!is.na(value[measure == "quickness" & FastorSlow == "s"])) &
         all(!is.na(value[measure == "character" & FastorSlow == "f"])) &
         all(!is.na(value[measure == "character" & FastorSlow == "s"]))) %>%
  ungroup() %>%
  distinct(ID)

# Filter the original dataset to keep only rows with those IDs
filteredData <- cdf %>% filter(ID %in% uniqueIDs$ID)

distinctIDcount <- filteredData %>% summarise(count = n_distinct(ID))
uninum <- distinctIDcount[[1]]

From the combined data, there are 229 unique participants that have at least one non-NA response in each unique combination of Quickness, Character Judgments, and Decision Speed conditions.

#### Data exclusion / filtering

excluData <- filteredData %>%
  group_by(ID) %>%   # Group by ID
  
  # all of these responses are ones from the groups that 
  # passed their individual attention checks
  
  # removed any ids that rate the fast person as slower than the slow person
  filter(!any(measure == "quickness" &
              FastorSlow == "f" &
              value < value[measure == "quickness" & FastorSlow == "s"])) %>%
  
  # filter out ids that have all the same non-na response values
  filter(n_distinct(value, na.rm = TRUE) > 1) %>%
  
  ungroup() # Ungroup after filtering

After filtering out responses from participants that incorrectly rated the slow condition as faster or equal to the quick condition we were left with a sample of 223 unique participant IDs. Then, we also excluded responses from any participants whose non-NA responses were all the same, resulting in a final combined sample size of 220. Finally, we calculated the average rating for each participant’s moral character evaluations split by the fast and slow categories for visualization purposes and statistical analyses.

#### Prepare data for analysis - create columns etc.

filtereDData <- excluData %>%
  filter(measure %in% c("quickness", "character"))


# Calculate averages for "character" measure grouped by ID and FastorSlow
characterAverages <- filtereDData %>%
  filter(measure == "character") %>%
  group_by(ID, FastorSlow, condition) %>%
  summarize(
    response_reverse = mean(as.numeric(response_reverse), na.rm = TRUE),
    .groups = "drop"
  ) %>%
  mutate(measure = "character") # Ensure the measure column remains "character"


# Keep only the "quickness" rows
quicknessData <- filtereDData %>%
  filter(measure == "quickness") %>%
  select(ID, FastorSlow, condition, response_reverse, measure)


# Combine the averaged "character" data with the "quickness" data
tidyDF <- rbind(quicknessData, characterAverages)

## Descriptive Statistics

# Determine whether we had a balanced design

## sample size of each randomized condition group
tidyDF %>%
  group_by(condition) %>%
  summarize(subject_count = n_distinct(ID), .groups = "drop")

# A tibble: 2 × 2
  condition subject_count
  <fct>             <int>
1 immoral             110
2 moral               110

After all of the cleaning and exclusions, we ended up with a perfectly balanced design where 110 participants were randomly assigned to the immoral condition, and 110 were randomly assigned to the moral condition.

Confirmatory Analysis

The combined data were used to create a bar graph and line plot to explore the potential for an interaction between decision speed and morality condition on moral character judgement.

### Data Visualization (Combined Data)

# Filter data to include only rows where measure is "character"
characterData <- tidyDF %>%
  filter(measure == "character") %>%
  mutate(response_reverse = as.numeric(response_reverse), # response is numeric
         FastorSlow = as.factor(FastorSlow), # Convert FastorSlow to factor
         condition = as.factor(condition), # Convert condition to factor
         ID = as.factor(ID)) # Convert ID to factor


# Create the bar plot
barPlot <- ggplot(characterData, 
       aes(x = condition, y = response_reverse, fill = FastorSlow)) +
  stat_summary(fun = mean, # Compute mean of response
               geom = "bar", # Use bars to represent the summary
               position = position_dodge(width = 0.8), width = 0.8) +
  stat_summary(fun.data = mean_se, # Add error bars (mean ± standard error)
               geom = "errorbar",
               position = position_dodge(width = 0.8), width = 0.2) +
  labs(x = "Morality Condition",
       y = "(Positive) Moral Character Evaluation",
       fill = "Decision Speed") +
  scale_fill_manual(values = c("f" = "orange1", "s" = "magenta3"),
                    labels = c("f" = "Quick", "s" = "Slow")) +
  scale_y_continuous(breaks = 1:7) +
  coord_cartesian(ylim = c(1, 7)) +
  theme_bw() +
  theme(axis.title.x = element_text(size = 11),
        axis.title.y = element_text(size = 11),
        legend.position = "none")


# Create a line plot
linePlot <- ggplot(characterData, 
       aes(x = condition, y = response_reverse, 
           color = FastorSlow, group = FastorSlow)) +
  stat_summary(fun = mean, # Compute mean of response
               geom = "line", # Use lines to connect the means
               linewidth = 1) +
  stat_summary(fun = mean, # Add points for the means
               geom = "point",  size = 3) +
  stat_summary(fun.data = mean_se, # Add error bars (mean ± standard error)
               geom = "errorbar", width = 0.4) +
  labs(x = "Morality Condition",
       y = "(Positive) Moral Character Evaluation",
       color = "Decision Speed") +
  scale_color_manual(values = c("f" = "orange1", "s" = "magenta3"), 
                     labels = c("f" = "Quick", "s" = "Slow")) +
  scale_y_continuous(breaks = 1:7) +
  coord_cartesian(ylim = c(1, 7)) +
  theme_bw() +
  theme(axis.title.x = element_text(size = 11),
        axis.title.y = element_text(size = 11),
        legend.position = "right",
        legend.title = element_text(size = 12),
        legend.text = element_text(size = 10))


# Combine plots using patchwork
combinedPlot <- (barPlot + linePlot) +
  plot_layout(widths = c(2, 1)) + # Bar plot takes up 2/3, line plot 1/3
  plot_annotation(
    title = "Character Judgements by Morality Condition and Decision Speed  \n (Combined Data)",
    theme = theme(plot.title = element_text(hjust = 0.5, size = 14, face = "bold")))


# Display the combined plot
print(combinedPlot)

The intersecting lines in the line plot indicate a very high likelihood of an interaction prior to any statistical tests being run.

Following this evidence, our model was fit; we chose a mixed effects model to appropriately account for the repeated measures on participants with a random intercept. Then, a type III, two-way ANOVA was run on the model to assess if the interaction was needed in the model to explain the patterns in the data (aka, if it is significant). A type III ANOVA was appropriate to test models with interaction terms, and a two-way ANOVA was needed because we have two 2-level factors being tested: (1) Decision Speed (fast or slow), and (2) Morality Condition (moral or immoral).

# Fit a mixed model (accounts for ID) with the interaction of interest
intmodc <- lmer(response_reverse ~ condition * FastorSlow + (1 | ID), 
               data = characterData)

# Use a Type III, 2x2 Two-way ANOVA test
# the results will tell us if the interaction term in needed in the model
Anova(intmodc, type = "III")

Analysis of Deviance Table (Type III Wald chisquare tests)

Response: response_reverse
                      Chisq Df Pr(>Chisq)    
(Intercept)          936.43  1  < 2.2e-16 ***
condition            577.08  1  < 2.2e-16 ***
FastorSlow           179.27  1  < 2.2e-16 ***
condition:FastorSlow 374.43  1  < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The results of the ANOVA were consistent with what we saw in our figure; the interaction between decision speed and morality condition had a significant effect on moral character judgement. In short, the experiment replicated!

\(~\)

Exploratory Analyses

We used diagnostics plots to assess model assumptions and determine how well the model fits and explains the data.

Model Diagnostics

Independence Assumption: Assessing Independence of Observations is a thought exercise. No one participant’s responses should impact another participant’s responses, therefore there is no reason to suspect violations of this assumption.

To assess Linearity and Homogeneity of Variance, the Residuals vs Fitted plot will be investigated.

fitted_vals <- fitted(intmodc)
residuals <- resid(intmodc)

plot(fitted_vals, residuals,
    pch = 19, col = "black",
    main = "Residuals vs Fitted",
    xlab = "Fitted Values",
    ylab = "Residuals")
abline(h = 0, col = "black")  # Horizontal line at 0
lines(lowess(fitted_vals, residuals), col = "red", lwd = 2)

Linearity Assumption: The red line did a fairly good job of following the horizontal zero line, indicating little evidence of any missed curvature that our model would fail to account for. Therefore, there was not enough deviation to suspect problematic violation of this assumption.

Homoskedacity: To assess homogeneity of variance, we were looking for drastic changes in vertical spread of points on the residual vs. fitted plot. For fitted values between 2 and 4.5 there was an increasing fanning pattern, and fitted values between 4.5 and 7 illustrated a decreasing fanning pattern. This double fanning, or diamond shape, indicated a clear violation of the homoskedacity assumption.

Normality of Residuals: To assess Normality of Residuals, the Normal Q-Q Plot and a histogram of residuals were referenced.

# Normal Q-Q Plot
qqplot <- ggplot(data.frame(resid = resid(intmodc)), aes(sample = resid)) +
  stat_qq(size = 1.5, shape = 1) +  # Adjust point size and shape
  stat_qq_line() +  # Add the Q-Q line
  labs(title = "Normal Q-Q Plot",
       x = "Theoretical Quantiles",
       y = "Sample Quantities") +
  theme_minimal() + 
  theme(plot.title = element_text(hjust = 0.5), 
        axis.title = element_text(size = 11), 
        axis.text = element_text(size = 11), 
        aspect.ratio = 1 / 2)  # Makes the plot rectangular (width > height)


# Histogram of Residuals
eij = residuals(intmodc)
normhist <- ggplot(data.frame(resid = eij), aes(x = resid)) +
  geom_histogram(binwidth = 0.5, color = "black", fill = "lightgray") +
  labs(title = "Histogram of Residuals", 
       x = "Residuals", 
       y = "Frequency") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5),
        axis.title = element_text(size = 11),
        axis.text = element_text(size = 11),
        aspect.ratio = 2 / 3)  # Makes the plot rectangular (width > height)


# Combine plots using patchwork
normresid_plot <- qqplot + normhist +
  plot_layout(ncol = 2)


# Display the combined plot
print(normresid_plot)

The Normal Q-Q plot and Histogram of Residuals both showed evidence of a heavy-tailed pattern, which indicated potential for problematic violation of the normality assumption.

Multicollinearity: We investigated the variance inflation factors (vifs) to explore the potential for Multicollinearity.

car::vif(intmodc)

           condition           FastorSlow condition:FastorSlow 
            1.478771             2.000000             2.478771

sqrt(2.478771)

[1] 1.574411

The square root of the interaction’s variance inflation factor told us that the standard error for the interaction term was 1.574 times larger due to multicollinearity (the sharing of information) with other variables in the model than it would have been otherwise.

Interpretation of Results

As previously mentioned, the study replicated! Our model and statistical analysis also indicated that the interaction term was necessary in the model.

Next, to give more insight, we calculated a confidence interval for the model coefficient estimates.

summary(intmodc)

Linear mixed model fit by REML ['lmerMod']
Formula: response_reverse ~ condition * FastorSlow + (1 | ID)
   Data: characterData

REML criterion at convergence: 1226.4

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.6192 -0.5361  0.0594  0.5416  3.0425 

Random effects:
 Groups   Name        Variance Std.Dev.
 ID       (Intercept) 0.3519   0.5932  
 Residual             0.6464   0.8040  
Number of obs: 440, groups:  ID, 220

Fixed effects:
                           Estimate Std. Error t value
(Intercept)                 2.91515    0.09526   30.60
conditionmoral              3.23636    0.13472   24.02
FastorSlows                 1.45152    0.10841   13.39
conditionmoral:FastorSlows -2.96667    0.15331  -19.35

Correlation of Fixed Effects:
            (Intr) cndtnm FstrSl
conditinmrl -0.707              
FastorSlows -0.569  0.402       
cndtnmrl:FS  0.402 -0.569 -0.707

confint(intmodc)

Computing profile confidence intervals ...

                                2.5 %     97.5 %
.sig01                      0.4679277  0.7077774
.sigma                      0.7309898  0.8813511
(Intercept)                 2.7288329  3.1014701
conditionmoral              2.9728693  3.4998579
FastorSlows                 1.2390780  1.6639523
conditionmoral:FastorSlows -3.2670981 -2.6662352

Interpreting the Interaction: Making a moral decision slowly resulted in lower moral character judgement, compared to moral decisions made quickly. The opposite was true for immoral decisions; making an immoral decision slowly resulted in higher moral character judgement compared to immoral decisions made quickly. In more general terms, the effect of decision speed on moral character judgement was very dependent on the morality of the decision made.

\(~\)

Discussion

Summary of Replication Attempt

The results from our confirmatory analysis indicated that the interaction between decision speed and decision morality had a significant impact on character evaluations. Like the original study, quick decisions lead to more extreme character judgements, while slow decisions were associated with less harsh judgements. Our ANOVA corroborated the statistical significance of the interaction between these variables, therefore confirming that we successfully replicated the key outcome of the original study.

Commentary

Although we found a violation of the normality assumption, we do not believe this to be particularly problematic for our purposes. The random effects in mixed models assume multivariate normality, but moderate deviations often have minimal impact on the validity of the model estimates, particularly if the sample size is large. Since our sample size for the combined data is considered large under the lens of the Central Limit Theorem, we deem this violation to be unimpactful.

Despite a few minor differences between our study and the original, such as the data collection modality and exclusion criteria, the replication of the original findings still suggest that these modifications did not affect the outcome.

There were no objections raised by the original authors regarding our replication attempt. We made every effort to replicate their procedure as closely as possible by using the exact written scenarios composed for Experiment 1.

\(~\)

Credit Taxonomy

Harley Clifton: Conceptualization, Formal Analysis, Investigation, Data Curation, Writing, Visualization, Project Administration. Sara Hamidi: Conceptualization, Software, Investigation, Resources, Data Curation. Prosperity Land: Conceptualization, Investigation, Writing, Visualization. Isabella Mullen: Conceptualization, Software, Investigation, Resources, Data Curation, Project Administration.

\(~\)

Appendix

Our Group’s Findings, Analysis, & Results

Data Preparation

For our preparation plan, we removed observations that met our exclusion criteria. We also pivoted our dataset to a long format to expedite data visualization.

Data preparation following the analysis plan:

#### Import data
df <- read_csv("../data/quick_decisions_1_clean_data.csv")

Rows: 2200 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (6): ID, response, condition, FastorSlow, measure, response_reverse
dbl (2): trial_index, time_elapsed

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

colfactor <- c("condition", "ID", "FastorSlow", "measure")
df[colfactor] <- lapply(df[colfactor], as.factor)

## some participants did not respond to every question
## Making columns for those with NAs as placeholders
index_mapping <- df %>%
  select(trial_index, FastorSlow, measure) %>%
  distinct() %>%
  drop_na()

 # Create a complete data frame ensuring all trial indexes (4-25) exist for each ID
complete_data <- df %>%
  complete(ID, trial_index = 4:25, fill = list(response = NA)) %>%
  left_join(index_mapping, by = "trial_index") %>%
  group_by(ID) %>%
  mutate(
    condition = first(condition, order_by = trial_index)
  ) %>%
  ungroup()

 # Overwrite the original `FastorSlow` and `measure` columns with the mapped values
complete_data <- complete_data %>%
  mutate(
    FastorSlow = FastorSlow.y,
    measure = measure.y
  ) %>%
  select(-FastorSlow.y, -measure.y, -FastorSlow.x, -measure.x)  
  # Remove the temporary columns

The first step in our data cleaning was to obtain the subset of data from participants who had at least one non-NA response in each unique combination of Quickness, Character Judgments, and Decision Speed conditions.

unique_ids <- complete_data %>%
  # Focus only on the relevant combinations of measure and FastorSlow
  filter(measure %in% c("quickness", "character") & FastorSlow %in% c("f", "s")) %>%
  group_by(ID) %>%
  # Check if each ID has at least one non-NA response for all combinations
  filter(all(!is.na(response[measure == "quickness" & FastorSlow == "f"])) &
         all(!is.na(response[measure == "quickness" & FastorSlow == "s"])) &
         all(!is.na(response[measure == "character" & FastorSlow == "f"])) &
         all(!is.na(response[measure == "character" & FastorSlow == "s"]))) %>%
  ungroup() %>%
  distinct(ID)

# Filter the original dataset to keep only rows with those IDs
filtered_data <- complete_data %>%
  filter(ID %in% unique_ids$ID)

distinct_id_count <- filtered_data %>% summarise(count = n_distinct(ID))
uni_num <- distinct_id_count[[1]]

For our group’s data, there were 38 unique participants (out of the original 100) that had at least one non-NA response in each unique combination of Quickness, Character Judgments, and Decision Speed conditions. Next, we excluded data from any participants who (a) failed our attention check, (b) incorrectly rated the slow condition as faster or equal to the quick condition, and/or (c) responded the same way for all non-NA responses.

#### Data exclusion / filtering

exclu_data <- filtered_data %>%
  group_by(ID) %>%  # Group by ID
  
# Response should contain "nate" or "justin" in rows where measure = "attention"
  filter(any(measure == "attention" & grepl("nate|justin", response, ignore.case = TRUE))) %>%
  
# removed any ids that rate the fast person as slower than the slow person
  filter(!any(measure == "quickness" &
              FastorSlow == "f" &
              response_reverse < response_reverse[measure == "quickness" & FastorSlow == "s"])) %>%
  
# filter out ids that have all the same non-na response values
  filter(n_distinct(response, na.rm = TRUE) > 1) %>%
  
  ungroup() # Ungroup after filtering


distinct_id_count <- exclu_data %>% summarise(count = n_distinct(ID))
finss <- distinct_id_count[[1]]

For our group’s data, after filtering out responses based on the previously specified exclusion criteria we were left with a final sample size of 32 particpant responses.

#### Prepare data for analysis - create columns etc.

filtered_data <- exclu_data %>%
  filter(measure %in% c("quickness", "character"))


# Calculate averages for "character" measure grouped by ID and FastorSlow
character_averages <- filtered_data %>%
  filter(measure == "character") %>%
  group_by(ID, FastorSlow, condition) %>%
  summarize(
    response_reverse = mean(as.numeric(response_reverse), na.rm = TRUE),
    .groups = "drop"
  ) %>%
  mutate(measure = "character") # Ensure the measure column remains "character"


# Keep only the "quickness" rows
quickness_data <- filtered_data %>%
  filter(measure == "quickness") %>%
  select(ID, FastorSlow, condition, response_reverse, measure)


# Combine the averaged "character" data with the "quickness" data
tidydf <- rbind(quickness_data, character_averages)

## Descriptive Statistics

# Determine whether we had a balanced design

## sample size of each randomized condition group
tidydf %>%
  group_by(condition) %>%
  summarize(subject_count = n_distinct(ID), .groups = "drop")

# A tibble: 2 × 2
  condition subject_count
  <fct>             <int>
1 immoral              17
2 moral                15

Even though our group’s final sample size was quite small due to all of the exclusion criteria, it still ended up being a fairly balanced design.

Confirmatory Analysis

### Data Visualization (Our Group Only)

# Filter data to include only rows where measure is "character"
character_data <- tidydf %>%
  filter(measure == "character") %>%
  mutate(response_reverse = as.numeric(response_reverse), # Ensure response is numeric
         FastorSlow = as.factor(FastorSlow), # Convert FastorSlow to factor
         condition = as.factor(condition), # Convert condition to factor
         ID = as.factor(ID)) # Convert ID to factor


# Create the bar plot
barplot <- ggplot(character_data, 
       aes(x = condition, y = response_reverse, fill = FastorSlow)) +
  stat_summary(fun = mean, # Compute mean of response
               geom = "bar", # Use bars to represent the summary
               position = position_dodge(width = 0.8), width = 0.8) +
  stat_summary(fun.data = mean_se, # Add error bars (mean ± standard error)
               geom = "errorbar",
               position = position_dodge(width = 0.8), width = 0.2) +
  labs(x = "Morality Condition",
       y = "(Positive) Moral Character Evaluation",
       fill = "Decision Speed") +
  scale_fill_manual(values = c("f" = "orange1", "s" = "magenta3"),
                    labels = c("f" = "Quick", "s" = "Slow")) +
  scale_y_continuous(breaks = 1:7) +
  coord_cartesian(ylim = c(1, 7)) +
  theme_bw() +
  theme(axis.title.x = element_text(size = 11),
        axis.title.y = element_text(size = 11),
        legend.position = "none")


# Create a line plot
lineplot <- ggplot(character_data, 
       aes(x = condition, y = response_reverse, 
           color = FastorSlow, group = FastorSlow)) +
  stat_summary(fun = mean, # Compute mean of response
               geom = "line", # Use lines to connect the means
               linewidth = 1) +
  stat_summary(fun = mean, # Add points for the means
               geom = "point",  size = 3) +
  stat_summary(fun.data = mean_se, # Add error bars (mean ± standard error)
               geom = "errorbar", width = 0.4) +
  labs(x = "Morality Condition",
       y = "(Positive) Moral Character Evaluation",
       color = "Decision Speed") +
  scale_color_manual(values = c("f" = "orange1", "s" = "magenta3"), 
                     labels = c("f" = "Quick", "s" = "Slow")) +
  scale_y_continuous(breaks = 1:7) +
  coord_cartesian(ylim = c(1, 7)) +
  theme_bw() +
  theme(axis.title.x = element_text(size = 11),
        axis.title.y = element_text(size = 11),
        legend.position = "right",
        legend.title = element_text(size = 12),
        legend.text = element_text(size = 10))


# Combine plots using patchwork
combined_plot <- (barplot + lineplot) +
  plot_layout(widths = c(2, 1)) + # Bar plot takes up 2/3, line plot 1/3
  plot_annotation(
    title = "Character Judgements by Morality Condition and Decision Speed  \n (Group Data Only)",
    theme = theme(plot.title = element_text(hjust = 0.5, size = 14, face = "bold")))


# Display the combined plot
print(combined_plot)

## Running the ANOVA Interaction Test

# Fit a mixed model (accounts for ID) with the interaction of interest
intmod <- lmer(response_reverse ~ condition * FastorSlow + (1 | ID), 
               data = character_data)

boundary (singular) fit: see help('isSingular')

# Use a Type III, 2x2 Two-way ANOVA test
# the results will tell us if the interaction term in needed in the model
Anova(intmod, type = "III")

Analysis of Deviance Table (Type III Wald chisquare tests)

Response: response_reverse
                       Chisq Df Pr(>Chisq)    
(Intercept)          223.742  1  < 2.2e-16 ***
condition            120.228  1  < 2.2e-16 ***
FastorSlow            17.101  1  3.544e-05 ***
condition:FastorSlow  55.641  1  8.699e-14 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Despite having such a small sample size with just our groups data, the study still replicated!

Model Diagnostics

Independence Assumption Assessing Independence of Observations is a thought exercise. No participants responses should impact another participants responses, therefore there is no reason to suspect violations of this assumption.

To assess Linearity and Homogeneity of Variance, the Residuals vs Fitted plot was investigated.

fitted_vals <- fitted(intmod)
residuals <- resid(intmod)

plot(fitted_vals, residuals,
    pch = 19, col = "black",
    main = "Residuals vs Fitted",
    xlab = "Fitted Values",
    ylab = "Residuals")
abline(h = 0, col = "black")  # Horizontal line at 0
lines(lowess(fitted_vals, residuals), col = "red", lwd = 2)

Linearity Assumption The red line did a great job of following the horizontal zero line, indicating little evidence of any missed curvature that our model failed to account for. Therefore, there was no evidence to suspect any violations of this assumption.

Homoskedacity To assess homogeneity of variance, we looked for drastic changes in vertical spread of point on the residual vs. fitted plot. AS fitted values increase, we see a slight decreasing fanning pattern. This suggested slight evidence against our equal variance assumption, but was not extreme enough to be problematic for our results.

Normality of Residuals To assess Normality of Residuals, the Normal Q-Q Plot and a histogram of residuals was referenced.

# Normal Q-Q Plot
qqplot <- ggplot(data.frame(resid = resid(intmod)), aes(sample = resid)) +
  stat_qq(size = 1.5, shape = 1) +  # Adjust point size and shape
  stat_qq_line() +  # Add the Q-Q line
  labs(title = "Normal Q-Q Plot",
       x = "Theoretical Quantiles",
       y = "Sample Quantities") +
  theme_minimal() + 
  theme(plot.title = element_text(hjust = 0.5), 
        axis.title = element_text(size = 11), 
        axis.text = element_text(size = 11), 
        aspect.ratio = 1 / 2)  # Makes the plot rectangular (width > height)


# Histogram of Residuals
eij = residuals(intmod)
normhist <- ggplot(data.frame(resid = eij), aes(x = resid)) +
  geom_histogram(binwidth = 0.5, color = "black", fill = "lightgray") +
  labs(title = "Histogram of Residuals", 
       x = "Residuals", 
       y = "Frequency") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5),
        axis.title = element_text(size = 11),
        axis.text = element_text(size = 11),
        aspect.ratio = 2 / 3)  # Makes the plot rectangular (width > height)


# Combine plots using patchwork
normresid_plot <- qqplot + normhist +
  plot_layout(ncol = 2)


# Display the combined plot
print(normresid_plot)

The Normal Q-Q plot and Histogram of Residuals both showed evidence of a light-tailed (short-tailed) pattern. This meant that low values are higher than we expected, and high values are lower than we would expect. This is actually really good, meaning our residuals were even less problematic than expected! Therefore, there were no violations of the normality assumption.

Multicollinearity The potential for Multicollinearity was explored by investigating the variance inflation factors (vifs).

car::vif(intmod)

           condition           FastorSlow condition:FastorSlow 
            2.000000             1.882353             2.882353

sqrt(2.882353)

[1] 1.697749

The square root of the interaction’s variance inflation factor told us that the standard error for the interaction term is 1.7 times larger due to multicollinearity (the sharing of information) with other variables in the model than it would have been otherwise.