Replication of Study ‘Effects of donation collection methods on donation amount: Nudging donation for the cause and overhead’ by Suk & Mudita (2022, Psychology and Marketing)

Author

Nina Rice nmrice@ucsd.edu

Published

December 11, 2024

Introduction

Charitable organizations rely on donations to manage their overhead expenses, in addition to the funds contributed towards their advertised cause. Overhead aversion is the tendency for donors to resist contributing to operational expenses, preferring instead that their contributions go directly toward the cause. In the replication of Study 2 from the paper Effects of donation collection methods on donation amount: Nudging donation for the cause and overhead, the aim is to test the original study’s findings on how donation collection methods impact donation behaviors, specifically targeting overhead aversion. This aversion can hinder charities’ abilities to cover essential costs, potentially compromising their long-term goals.

The original study investigates whether presenting the cause and overhead components of donations in different sequences can alleviate overhead aversion, increase overall donations, and maintain donor satisfaction. Suk & Mudita found that participants give more money when asked to donate to a charity’s overhead expenses first, as opposed to cause first. In an attempt to replicate this experiment, participants engaged in a charitable donation scenario, deciding how to allocate a specified amount across cause and overhead expenses. The key manipulation is the sequence of allocation decisions: participants either allocated to the cause first, followed by overhead, or to overhead first, followed by cause.

Link to GitHub repo: https://github.com/ucsd-psych201a/suk2023

Link to OSF: https://osf.io/89xav/

Link to Survey: https://ucsd-psych201a.github.io/suk2023/website/

Methods

Power Analysis

We conducted an a priori power analysis using G*Power to assess the original finding, that participants donated more in the overhead-first condition compared to the cause-first condition. A t-test for two independent means was used in the power analysis. Based on the reported effect size (d = 0.41), we estimated that 188 participants would be required to replicate the finding with a statistical power of 0.8.

Planned Sample

We aimed to recruit about 188 adult participants via Prolific, allowing us to test roughly 94 participants per each of our two experimental conditions. We did not control for demographic characteristics such as age, gender, and so on. Once target participation was met, data collection was terminated.

Materials

Using js-Psych we coded a website containing our survey, which participants accessed through Prolific. The original study’s wording was replicated as closely as possible with help from the original authors. Our survey consisted of two parts: an 10-question unrelated cover survey, and the donation collection section. The cover survey asked participants about their daily lives and habits. The donation collection portion contained our measures of interest: amount donated towards cause, amount donated towards overhead, donation satisfaction, frequency of donation experiences, and demographic markers (age, gender, and nationality).

In between the unrelated survey and the donation collection portion, participants were filled in on the donation instructions, and comprehension checks were provided. We also gave a brief explation on the charity of choice (though specific details were hidden to reduce bias).

The overhead donation prompt read, “For covering charitable organizations’ operating expense:”. Beside it was a text box where participants could enter the amount they wanted to donate to the charity’s overhead. The cause funding item read, “For the cause:” and had a text box for inputting the donation amount intended towards the cause. The total amount was summed and displayed at the bottom of the screen. All values were indicated in US dollars.

After confirming their total donation, participants were asked to indicate their agreement with two questions, “I am satisfied with my donation,” and “I usually donate to charity.” These items were answered by inputting values on a 7-point Likert scale (1 indicating “not at all,” 7 indicating “very much”). Age, gender, and nationality were also asked of our participants. These values were used as variables for further exploratory analyses.

Procedure

Consent was obtained prior to beginning the experiment. We first provided our participants with the simple lifestyle survey. This portion served to reduce demand characteristics by obscuring what answers us researchers may have been looking for.

Participants were then notified that they had a 20% chance to win additional compensation (5 USD) upon completion of the remainder of the experiment. We informed them that they would have an opportunity to donate all, some, or none of this amount to a charity, and would next be asked to indicate the amounts of their choosing. To ensure participants were following along, we administered a multiple choice comprehension check here confirming what the chance was that they had in receiving the bonus money. All participants who answered other than 20% were excluded from analysis.

After this, participants were instructed that donations to charities go to two areas; helping the advertised causes, and covering their operational costs. It was explained that some organizations receive donations separately for the two categories, and another comprehension check was given here. Participants who did not answer correctly regarding the allocation of donation funding going towards both overhead and cause were excluded. Again they were reminded that they could donate as much or little as they want, not exceeding the allotted amount of 5 USD, and that they stood to receive whatever amount they did not donate.

In the subsequent donation collection simulation, participants were randomly assigned into one of our two conditions. In the overhead-first condition participants indicated how much money to donate first to the cause, and next they decided how much money to donate to the overhead expenses. In the cause-first condition, they did this in reverse, first indicating how much should go towards overhead, followed by cause.

To conclude the survey, we finished off with questions collecting exploratory and demographic data. Participants were then debriefed, including an explanation that the donation scenario was a rouse, and that they would receive the entire 5 USD if they are part of the randomly selected 20%, before finally asking for any feedback.

Design Overview

Just one factor was manipulated in this design; whether the participant was asked for their cause donation or overhead donation first. We measured four main variables; the amount donated towards overhead, amount donated towards cause, donation satisfaction, and frequency of prior donation experience. Age, gender, and nationality were also collected for demographic purposes.

This was a between-participants design. Participants did not repeat measures. Steps were taken to reduce demand characteristics by including an unrelated survey at the beginning of the study. This initial survey is designed to distract participants from identifying the true focus of the experiment, thereby minimizing the likelihood that they will alter their responses based on perceived expectations.

Analysis Plan

A series of regression analyses were used to mimic the testing used in the original survey. Our confirmatory analyses will help to examine the differences in the amounts donated to cause versus overhead, as well as total donation. First, we conduct a regression analysis to determine whether condition, age, and gender can explain the variation within the total amount donated by participants. Then, we conduct two separate regression analyses to determine the impact of these factors on amounts donated to our charity’s cause and overhead specifically.

For our exploratory analyses, we looked at participants’ self-reported satisfaction in a second round of tests. We performed a regression analysis using condition as an independent variable and satisfaction as the dependent variable, to see if the order in which donations are asked for affects donor sentiment. We then added age and gender as independent variables for another regression analysis, and the specific amounts donated to the overhead expenses and the cause in a third in order to assess interactions among all of our variables. We will use a p-value threshold of < 0.05 to determine our success in replication the original results.

Differences from Original Study

With help from the original authors, we were able to provide our participants with a close duplicate of the original survey in terms of formatting and instructions. Our materials did vary slightly from those in the original experiment due to us creating them ourselves. Our unrelated survey was created by our team, as permitted by the original author, because we were unable to access the unrelated survey of the original study. We additionally introduced comprehension checks into our methodology, which were not included in the original design.

The original survey’s sample was primarily UK based, however we did not control for nationality or other demographic factors. Due to budgeting constraints, we were only able to allow a 5 USD bonus amount for each participant, whereas the original authors had this amount nearly 3 USD higher (6 GBP, to be exact).

Actual Sample

Our 188 Prolific participants were reduced to a count of 136 (49% female, μage = 38) after being filtered for comprehension check performance, as both checks needed to be passed.

Differences from pre-data collection methods plan

Additional time and funding would be necessary in order to recruit enough participants who would pass the comprehension checks, so we ran the analyses with our sample of 136.

Results

Data preparation

Data was collected on the survey website through jsPsych packages in our HTML code, shared on GitHub. A pipeline was established between our paradigm and an OSF repository, which populated in .csv format.

Necessary libraries were first loaded.

library(jsonlite)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter()  masks stats::filter()
✖ purrr::flatten() masks jsonlite::flatten()
✖ dplyr::lag()     masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(effsize)

Our .csv files were compiled into a .zip, which this function turned into a vector with all of the filepaths of the .csv files within.

# specifies filepath to folder with data
file_paths <- c(
                list.files(path = "./raw_sample_data", full.names = TRUE)
                )

# specifies filepath to output data as one .csv file
output_file <- "./sampledata.csv"

Displayed next is the function for turning our .csv data files into rows with columns of our desired variables.

process_csv_files <- function(file_paths, output_file) {
  
  # Nested function to process a single file
  process_file <- function(file_path) {
    
    # Reads file
    testdata <- read_csv(file_path, show_col_types = FALSE)
    
    # Shortens file to specified columns
    tdcols <- subset(testdata, select = -c(rt, stimulus, trial_type,
                                           plugin_version, question_order))
    
    # Shortens file to specified rows
    tdrows <- tdcols[-c(1:13, 16:17), , drop = FALSE]
    
    # Rearranges elements of cleaned data into one cohesive row
    tdarrange <- data.frame(
      check1 = tdrows$response[1],
      check2 = tdrows$response[2],
      cause = tdrows$causeDonation[3],
      overhead = tdrows$expenseDonation[3],
      total = tdrows$totalDonation[3],
      condition = tdrows$questionOrder[3],
      satis_usual = tdrows$response[4],
      age = tdrows$age[5],
      nationality = tdrows$nationality[5],
      gender = tdrows$response[6],
      comment = tdrows$response[7]
    )
    
    # Parses the satisfaction responses, unnests JSON formatting
    tdparsed <- tdarrange %>%
      mutate(satis_usual = lapply(satis_usual, function(x) {
        tryCatch(fromJSON(x), error = function(e) NA)
        }))
    
    # Unnests the satisfaction/frequency column into two separate columns
    tdsep <- tdparsed %>%
      unnest_wider(col = satis_usual, names_sep = "_")
    
    # Renames columns
    tdclean <- tdsep %>% 
      rename(satisfaction = satis_usual_Q0,
             frequency = satis_usual_Q1)
    
    # Returns cleaned data
    return(tdclean)
  }
  
  # Apply nested function to each file
  processed_files <- lapply(file_paths, process_file)
  
  # Combines files, adds identifying number
  all_data <- bind_rows(processed_files, .id = "file_id")
  
  # Saves the combined data
  write_csv(all_data, output_file)
  
  # Returns combined data
  return(all_data)
}

result <- process_csv_files(file_paths, output_file)

head(result, 3)

Here is additional cleaning of our data to remove remnants of JSON formatting, rename variables, and exclude participants who failed comprehension checks.

# Function removes all of the special charcters from the data
remove_special_characters <- function(entry) {
  
  gsub("[^a-zA-Z0-9\\s]", "", entry)
  
}

# Applies function to all relevant columns
cleanresult <- result %>%
  mutate(check1 = sapply(check1, remove_special_characters)) %>% 
  mutate(check2 = sapply(check2, remove_special_characters)) %>% 
  mutate(condition = sapply(condition, remove_special_characters)) %>% 
  mutate(gender = sapply(gender, remove_special_characters)) %>% 
  mutate(comment = sapply(comment, remove_special_characters))

# Edits data for legibility
cleanresult <- cleanresult %>%
  mutate(
    condition = recode(condition,
      ForthecauseForcoveringcharitableorganizationsoperatingexpense = 'causefirst',
      ForcoveringcharitableorganizationsoperatingexpenseForthecause = 'overfirst'
    ),
    
    check1 = gsub("chanceresponse", "", check1),
    check2 = gsub("donationuseresponse", "", check2),
    gender = gsub("gender", "", gender),
    comment = gsub("Q0", "", comment)
  )

# Leaves only participants who answered correctly
filterresult <- cleanresult[cleanresult$check1 == 20 
                            & cleanresult$check2 == "Alloftheabove", ]

# Creates a new spreadsheet
write_csv(filterresult, "./filteredsampledata.csv")

head(filterresult, 3)

Confirmatory analysis

Results for these three regressions need to reach a p-value of less than 0.5, in the same direction as the original study, in order to be deemed as truly replicated. In terms of the original study, we tested whether the overhead-first condition significantly influenced participants to donate more than in the cause-first condition. Cause-first condition and female gender are the reference categories for our variables.

filterresult <- read_csv("/Users/nina/Downloads/filteredsampledata.csv", show_col_types = FALSE)
# Necessary for enabling Cohen's D calculations
splitdata <- split(filterresult, filterresult$condition) 
head(splitdata$overfirst, 3)
# A tibble: 3 × 13
  file_id check1 check2    cause overhead total condition satisfaction frequency
    <dbl>  <dbl> <chr>     <dbl>    <dbl> <dbl> <chr>            <dbl>     <dbl>
1       6     20 Allofthe…     2        0     2 overfirst            4         2
2      16     20 Allofthe…     0        0     0 overfirst            3         1
3      20     20 Allofthe…     0        0     0 overfirst            3         2
# ℹ 4 more variables: age <dbl>, nationality <chr>, gender <chr>, comment <chr>
head(splitdata$causefirst, 3)
# A tibble: 3 × 13
  file_id check1 check2    cause overhead total condition satisfaction frequency
    <dbl>  <dbl> <chr>     <dbl>    <dbl> <dbl> <chr>            <dbl>     <dbl>
1       1     20 Allofthe…     0      0     0   causefir…            6         1
2       3     20 Allofthe…     3      2     5   causefir…            6         6
3       5     20 Allofthe…     1      0.5   1.5 causefir…            6         4
# ℹ 4 more variables: age <dbl>, nationality <chr>, gender <chr>, comment <chr>

Analysis 1: Total donation amount predicted by condition, age and gender. The model yielded a significant effect for age (βa = -0.027, SEa = 0.013, p = 0.035) and gender (βg = -0.685, SEg = 0.310, p = 0.029). Older participants and male participants donated lesser total amounts than younger and female counterparts. The model’s p-value of 0.097 is not significant compared to our threshold.

# Regression model
summary(lm('total ~ condition + age + gender', 
           data = filterresult))

Call:
lm(formula = "total ~ condition + age + gender", data = filterresult)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.4897 -1.3151 -0.5092  0.8497  3.8677 

Coefficients:
                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)           2.80183    0.56851   4.928 2.48e-06 ***
conditionoverfirst    0.34397    0.30452   1.130   0.2608    
age                  -0.02734    0.01284  -2.129   0.0351 *  
genderMale           -0.68530    0.30992  -2.211   0.0288 *  
genderNonbinary      -0.11264    1.26692  -0.089   0.9293    
genderPrefernottosay  0.38164    1.78738   0.214   0.8313    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.759 on 130 degrees of freedom
Multiple R-squared:  0.06833,   Adjusted R-squared:  0.03249 
F-statistic: 1.907 on 5 and 130 DF,  p-value: 0.09747
# Table of summary statistics among conditions
print(
  data.frame(
  Statistic = c("Mean", "Standard Deviation"),
  Overhead = c(round(mean(splitdata$overfirst$total), 2),
               round(sd(splitdata$overfirst$total), 2)),
  Cause = c(round(mean(splitdata$causefirst$total), 2),
            round(sd(splitdata$causefirst$total), 2))
))
           Statistic Overhead Cause
1               Mean     1.75  1.47
2 Standard Deviation     1.93  1.65
# Cohen's D calculation
cohen.d(splitdata$overfirst$total, splitdata$causefirst$total)

Cohen's d

d estimate: 0.1571732 (negligible)
95 percent confidence interval:
     lower      upper 
-0.1828745  0.4972209 

Analysis 2: Overhead donation amount predicted by condition, age and gender. This model also found significant effects observed for age (βa = -0.015, SEa = 0.006, p = 0.012) and gender (βg = -0.406, SEg = 0.141, p = 0.005), in the same directions as in prior analysis. This model exhibits overall significance with a p-value of 0.021.

# Regression model
summary(lm('overhead ~ condition + age + gender', 
           data = filterresult))

Call:
lm(formula = "overhead ~ condition + age + gender", data = filterresult)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.0663 -0.5410 -0.2396  0.3893  4.0831 

Coefficients:
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)           1.290311   0.258375   4.994 1.87e-06 ***
conditionoverfirst    0.134722   0.138400   0.973  0.33215    
age                  -0.014946   0.005836  -2.561  0.01157 *  
genderMale           -0.405731   0.140854  -2.881  0.00465 ** 
genderNonbinary      -0.386863   0.575788  -0.672  0.50285    
genderPrefernottosay  0.083347   0.812332   0.103  0.91844    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.7996 on 130 degrees of freedom
Multiple R-squared:  0.09625,   Adjusted R-squared:  0.06149 
F-statistic: 2.769 on 5 and 130 DF,  p-value: 0.02067
# Table of summary statistics among conditions
print(
  data.frame(
  Statistic = c("Mean", "Standard Deviation"),
  Overhead = c(round(mean(splitdata$overfirst$overhead), 2),
               round(sd(splitdata$overfirst$overhead), 2)),
  Cause = c(round(mean(splitdata$causefirst$overhead), 2),
            round(sd(splitdata$causefirst$overhead), 2))
))
           Statistic Overhead Cause
1               Mean     0.64  0.54
2 Standard Deviation     0.93  0.71
# Cohen's D calculation
cohen.d(splitdata$overfirst$overhead, splitdata$causefirst$overhead)

Cohen's d

d estimate: 0.1192818 (negligible)
95 percent confidence interval:
     lower      upper 
-0.2205443  0.4591080 

Analysis 3: Cause donation amount predicted by condition, age, and gender. The original study’s significant effect of condition failed to replicate (β = 0.209, SE = 0.206, p = 0.311), but the original direction is preserved. Participants in the overhead first condition (μo = 1.11, σo = 1.33) donated more than participants in the cause first condition (μc = 0.93, σc = 1.04). This difference has a small effect size (d = 0.154, 95% CI: -0.186, 0.494) relative to the original study’s (d = 0.33). Our regression analysis of donations to the charity’s cause had no significant explanatory variables, and an insignificant p-value of 0.457.

# Regression model
summary(lm('cause ~ condition + age + gender', 
           data = filterresult))

Call:
lm(formula = "cause ~ condition + age + gender", data = filterresult)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.4233 -0.8757 -0.2950  0.4362  3.9802 

Coefficients:
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)           1.511520   0.384364   3.933 0.000136 ***
conditionoverfirst    0.209244   0.205886   1.016 0.311369    
age                  -0.012392   0.008681  -1.427 0.155843    
genderMale           -0.279574   0.209536  -1.334 0.184455    
genderNonbinary       0.274219   0.856553   0.320 0.749374    
genderPrefernottosay  0.298290   1.208439   0.247 0.805422    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.19 on 130 degrees of freedom
Multiple R-squared:  0.0349,    Adjusted R-squared:  -0.00222 
F-statistic: 0.9402 on 5 and 130 DF,  p-value: 0.4574
# Table of summary statistics among conditions
print(
  data.frame(
  Statistic = c("Mean", "Standard Deviation"),
  Overhead = c(round(mean(splitdata$overfirst$cause), 2),
               round(sd(splitdata$overfirst$cause), 2)),
  Cause = c(round(mean(splitdata$causefirst$cause), 2),
            round(sd(splitdata$causefirst$cause), 2))
))
           Statistic Overhead Cause
1               Mean     1.11  0.93
2 Standard Deviation     1.33  1.04
# Cohen's D calculation
cohen.d(splitdata$overfirst$cause, splitdata$causefirst$cause)

Cohen's d

d estimate: 0.1536157 (negligible)
95 percent confidence interval:
     lower      upper 
-0.1864086  0.4936400 

Figure 1. Visualization of replication study results.

# Calculates means and standard errors for donation amounts among both conditions
summary_data <- filterresult %>%
  group_by(condition) %>%
  summarise(
    total_mean = mean(total),
    total_se = sd(total) / sqrt(n()),
    cause_mean = mean(cause),
    cause_se = sd(cause) / sqrt(n()),
    overhead_mean = mean(overhead),
    overhead_se = sd(overhead) / sqrt(n())
  ) %>%
  
  # Shapes data for ease of use in ggplot
  pivot_longer(
    cols = -condition,
    names_to = c("category", ".value"),
    names_sep = "_"
  )
summary_data$category <- factor(summary_data$category, 
                                levels = c("total", "cause", "overhead"))

# Creates base for plot
ggplot(summary_data, aes(x = category, y = mean, fill = condition)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.4), 
           width = 0.4, color = "gray20") +
  
  # Adds error bars
  geom_errorbar(aes(ymin = mean - se, ymax = mean + se),
                position = position_dodge(width = 0.4), 
                width = 0.2, color = "gray20") +
  
  # Specifies aesthetics for color and label
  scale_fill_manual(
    values = c("causefirst" = "white", "overfirst" = "gray"),
    labels = c("Cause First", "Overhead First")) +
  labs(
    title = "Comparison of Donation Collection Methods",
    x = "Donation collection method",
    y = "Amount ($)",
    fill = "Condition"
  ) +
  
  # Specifies axes
  scale_y_continuous(
    expand = c(0, 0),
    limits = c(0, 2.25),
    breaks = seq(0, 2.25, .5)
  ) +
  
  # Specifies style
  theme_minimal() +
  theme(
    legend.position.inside = c(0.9, 0.9), 
    panel.background = element_rect(fill = "white", color = "gray20"), 
    plot.background = element_rect(fill = "white", color = NA)
  )

Figure 2. Visualization of original study results.

Exploratory analyses

Multiple exploratory regressions were conducted using donation satisfaction as the independent variable, while demographics, condition and donation amounts were our dependent variables.

Exploration 1: Satisfaction predicted by condition. Condition did not have a significant effect on donation satisfaction as the sole explanatory variable (β = -0.097, SE = 0.240, p = 0.687), similar to in the original study. Overall, the model is not significant (p = 0.687).

# Regression model
summary(lm('satisfaction ~ condition', 
           data = filterresult))

Call:
lm(formula = "satisfaction ~ condition", data = filterresult)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.6197 -1.5231  0.4286  1.3803  1.4769 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)         4.61972    0.16561  27.895   <2e-16 ***
conditionoverfirst -0.09664    0.23956  -0.403    0.687    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.395 on 134 degrees of freedom
Multiple R-squared:  0.001213,  Adjusted R-squared:  -0.006241 
F-statistic: 0.1627 on 1 and 134 DF,  p-value: 0.6873

Exploration 2: Satisfaction predicted by condition, age, and gender. Gender was the only significant explanatory variable (β = 0.680, SE = 0.239, p = 0.005). However, in the original study, age was the only significant explanatory variable under this analysis. Overall, the model is significant (p = 0.035).

# Regression model
summary(lm('satisfaction ~ condition + age + gender', 
           data = filterresult))

Call:
lm(formula = "satisfaction ~ condition + age + gender", data = filterresult)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.6192 -1.0820  0.2557  1.0503  1.8313 

Coefficients:
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)           3.975362   0.437604   9.084 1.48e-15 ***
conditionoverfirst   -0.192983   0.234404  -0.823   0.4118    
age                   0.009905   0.009884   1.002   0.3182    
genderMale            0.679652   0.238561   2.849   0.0051 ** 
genderNonbinary       0.309134   0.975199   0.317   0.7518    
genderPrefernottosay -2.222978   1.375827  -1.616   0.1086    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.354 on 130 degrees of freedom
Multiple R-squared:  0.08735,   Adjusted R-squared:  0.05225 
F-statistic: 2.489 on 5 and 130 DF,  p-value: 0.03454

Exploration 3: Satisfaction predicted by condition, age, gender, and donation amounts. In the original study, all explanatory variables aside from gender were significant. In our sample, though, gender is the only significant explanatory variable (β = 0.697, SE = 0.244, p = 0.005). Overall, the model is significant (p = 0.025).

# Regression model
summary(lm('satisfaction ~ condition + age + gender + cause + overhead', 
           data = filterresult))

Call:
lm(formula = "satisfaction ~ condition + age + gender + cause + overhead", 
    data = filterresult)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.4931 -1.0278  0.1928  1.1649  2.0190 

Coefficients:
                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)           3.77690    0.47713   7.916 1.02e-12 ***
conditionoverfirst   -0.22562    0.23371  -0.965  0.33619    
age                   0.01102    0.01005   1.096  0.27502    
genderMale            0.69673    0.24420   2.853  0.00505 ** 
genderNonbinary       0.20016    0.97175   0.206  0.83713    
genderPrefernottosay -2.28227    1.36538  -1.672  0.09706 .  
cause                 0.23161    0.11838   1.957  0.05258 .  
overhead             -0.11751    0.17610  -0.667  0.50579    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.344 on 128 degrees of freedom
Multiple R-squared:  0.1154,    Adjusted R-squared:  0.06704 
F-statistic: 2.386 on 7 and 128 DF,  p-value: 0.02512

Discussion

Summary of Replication Attempt

None of our confirmatory analyses replicated the original findings of conditional significance, yet they did display the same relational directions. Instead, we found significant predictors only in age and gender. Our exploratory analyses did not replicate in terms of effect and/or significance, either.

Commentary

While we did not confirm the anticipated effects of the experimental manipulation, we did uncover an unexpected relationship involving gender. Men donated less than women but reported higher levels of satisfaction with their donations. This suggests that men may derive greater satisfaction from giving less, which could reflect cultural or economic concerns.

Our confirmatory analyses pointed in the anticipated direction—participants in the overhead-first condition generally donated more—yet the effect remained statistically insignificant. Examination of donation distributions revealed that most participants, in both conditions, donated between $0 and $1. It’s possible that the underlying effect was too small to detect with our sample size of 136, which was fewer than we had planned to recruit.

Additionally, a substantial proportion of participants (54/136) donated nothing at all, with no significant difference in zero-donation rates across conditions. This uniform pattern diminished the statistical power of our experimental manipulation. Two notable methodological differences from the original study may have contributed to this outcome: (1) our U.S.-based sample versus their U.K.-based sample, and (2) our smaller financial incentive (5 USD vs. 6 GBP). These differences may have created a general reluctance to donate, overshadowing the original effect. Cultural, socioeconomic, or political factors unique to the U.S. sample, combined with the lower incentive amount, may have led participants to keep their earnings rather than donate.

In fact, relative to the original study’s participants—who donated about 45% of their winnings—our participants donated only about 32%. This significant reduction suggests a generalized pressure not to donate, which likely reduced the variation needed to detect an effect.

Our use of Prolific for recruitment may also have been problematic. Although aligned with minimum wage standards, our payment was relatively low, and asking participants to donate up to five times their compensation may have been perceived as unreasonable. While the original study also used Prolific, their U.K. participants may have reacted differently from our U.S. participants. Future studies may benefit from alternative recruitment strategies that reach demographics more willing or able to donate, potentially restoring the variation necessary to observe the original effect.

Potential confounds in the study could include participants’ prior donation experiences or attitudes toward charities, which may affect how much they are willing to donate regardless of the order of donation categories. Additionally, individual differences, such as participants’ financial situation or emotional connection to charitable causes, could influence donation behaviors. The use of an online setting might also affect engagement and the authenticity of responses compared to in-person scenarios.

In summary, our failure to replicate the original findings may stem from inadequate sample size, key methodological differences, and our chosen sampling method. Future replications should strive for larger samples, methodological consistency with the original study, and alternative recruitment methods. Moreover, our findings highlight intriguing gender differences in the psychology of donation behavior and suggest a need for further research on how Americans, in particular, respond to donation requests and overhead aversion.

Credits

Emma Gu: Conceptualization, Methodology, Software, Writing - original draft, and Writing - review & editing.
Fan Yang: Methodology, Software, Writing - original draft, and Writing - review & editing.
Nina Rice: Formal analysis, Software, Validation, Visualization, Writing - original draft, and Writing - review & editing.
William S. de Melo: Data curation, Formal analysis, Investigation, Software, Validation, Writing - original draft, and Writing - review & editing.