For this exercise, please try to reproduce the results from Experiment 6 of the associated paper (Shah, Shafir, & Mullainathan, 2015). The PDF of the paper is included in the same folder as this Rmd file.

Methods summary:

The authors were interested in the effect of scarcity on people’s consistency of valuation judgments. In this study, participants played a game of Family Feud and were given either 75 s (budget - “poor” condition) or 250 s (budget - “rich” condition) to complete the game. After playing the game, participants were either primed to think about a small account of time necessary to play one round of the game (account -“small” condition) or a large account (their overall time budget to play the entire game, account - “large” condition.) Participants rated how costly it would feel to lose 10s of time to play the game. The researchers were primarily interested in an interaction between the between-subjects factors of scarcity and account, hypothesizing that those in the budget - “poor” condition would be more consistent in their valuation of the 10s regardless of account in comparison with those in the budget - “rich” condition. The authors tested this hypothesis with a 2x2 between-subjects ANOVA.


Target outcomes:

Below is the specific result you will attempt to reproduce (quoted directly from the results section of Experiment 6):

“One participant was excluded because of a computer malfunction during the game. Time-rich participants rated the loss as more expensive when they thought about a small account (M = 8.31, 95% CI = [7.78, 8.84]) than when they thought about a large account (M = 6.50, 95% CI = [5.42, 7.58]), whereas time-poor participants’ evaluations did not differ between the small-account condition (M = 8.33, 95% CI = [7.14, 9.52]) and the large account condition (M = 8.83, 95% CI = [7.97, 9.69]). A 2 (scarcity condition) × 2 (account condition) analysis of variance revealed a significant interaction, F(1, 69) = 5.16, p < .05, ηp2 = .07.” (Shah, Shafir & Mullainathan, 2015) ——

Step 1: Load packages

library(tidyverse) # for data munging
library(knitr) # for kable table formating
library(haven) # import and export 'SPSS', 'Stata' and 'SAS' Files
library(readxl) # import excel files

# #optional packages:
# library(afex) #anova functions
# library(langcog) #95 percent confidence intervals

Step 2: Load data

# Just Experiment 6
data <- read_excel("data/study 6-accessible-feud.xlsx")

Step 3: Tidy data

The data are already tidy as provided by the authors.

Step 4: Run analysis

Pre-processing

One participant was excluded because of a computer malfunction during the game (Shah, Shafir, & Mullainathan, 2015, p. 408)

Note: The original paper does not identify the participant that was excluded, but it was later revealed through communication with the authors that it was participant #16. The exclusion is performed below.

# Participant #16 should be dropped from analysis 
excluded <- "16"

d <- data %>%
  filter(!Subject %in% excluded) #participant exclusions

Descriptive statistics

Time-rich participants rated the loss as more expensive when they thought about a small account (M = 8.31, 95% CI = [7.78, 8.84]) than when they thought about a large account (M = 6.50, 95% CI = [5.42, 7.58]), whereas time-poor participants’ evaluations did not differ between the small-account condition (M = 8.33, 95% CI = [7.14, 9.52]) and the large-account condition (M = 8.83, 95% CI = [7.97, 9.69]). (Shah, Shafir, & Mullainathan, 2015, p. 408)

#Remove the columns that are not part of the dataset
d <- d |> 
  select(Subject, Cond, Slack, Large, tmest, expense, error)

#Assume that the Large column is 0 for "small account" and 1 for "large account." Assume that Slack is 0 for "time-poor" and 1 for "time-rich." Assume that expense is the rating from each participant. 

#Time-poor and small account 
d_poor_small <- d |> 
  filter(Slack == 0 & Large == 0)
result_poor_small <- t.test(d_poor_small$expense)
cat("Results for time-poor and small account:")
## Results for time-poor and small account:
#Print the mean 
cat("\nM =", mean(d_poor_small$expense))
## 
## M = 8.333333
#Print the confidence interval
cat("\n95% Confidence Interval for Mean:", result_poor_small$conf.int, "\n")
## 
## 95% Confidence Interval for Mean: 7.067489 9.599178
#Time-poor and large account 
d_poor_large <- d |> 
  filter(Slack == 0 & Large == 1)
result_poor_large <- t.test(d_poor_large$expense)
cat("\nResults for time-poor and large account:")
## 
## Results for time-poor and large account:
#Print the mean 
cat("\nM =", mean(d_poor_large$expense))
## 
## M = 8.833333
#Print the confidence interval
cat("\n95% Confidence Interval for Mean:", result_poor_large$conf.int, "\n")
## 
## 95% Confidence Interval for Mean: 7.910843 9.755823
#Time-rich and small account 
d_rich_small <- d |> 
  filter(Slack == 1 & Large == 0)
result_rich_small <- t.test(d_rich_small$expense)
cat("\nResults for time-rich and small account:")
## 
## Results for time-rich and small account:
#Print the mean 
cat("\nM =", mean(d_rich_small$expense))
## 
## M = 8.3125
#Print the confidence interval
cat("\n95% Confidence Interval for Mean:", result_rich_small$conf.int, "\n")
## 
## 95% Confidence Interval for Mean: 7.737972 8.887028
#Time-rich and large account 
d_rich_large <- d |> 
  filter(Slack == 1 & Large == 1)
result_rich_large <- t.test(d_rich_large$expense)
cat("\nResults for time-rich and large account:")
## 
## Results for time-rich and large account:
#Print the mean 
cat("\nM =", mean(d_rich_large$expense))
## 
## M = 6.5
#Print the confidence interval
cat("\n95% Confidence Interval for Mean:", result_rich_large$conf.int, "\n")
## 
## 95% Confidence Interval for Mean: 5.340009 7.659991

Inferential statistics

A 2 (scarcity condition) × 2 (account condition) analysis of variance revealed a significant interaction, F(1, 69) = 5.16, p < .05, ηp2 = .07.

model <- aov(expense ~ Cond * Large, data = d)
summary(model)
##             Df Sum Sq Mean Sq F value Pr(>F)  
## Cond         1   22.0  22.023   4.703 0.0336 *
## Large        1   10.7  10.701   2.285 0.1352  
## Cond:Large   1   24.2  24.172   5.162 0.0262 *
## Residuals   69  323.1   4.683                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Calculate ηp² (partial eta-squared)
anova_table <- anova(model)
numerator <- anova_table$"Mean Sq"[4]
denominator <- sum(anova_table$"Mean Sq")
eta_squared <- numerator / denominator

#Print out values of interest
cat('\n F(1,69) =', model$coefficients[3])
## 
##  F(1,69) = 5.166667
cat('\n p = ', summary(model)[[1]]$'Pr(>F)'[3])
## 
##  p =  0.02620575
cat('\n ηp² =', eta_squared)
## 
##  ηp² = 0.07604389

Step 5: Reflection

Were you able to reproduce the results you attempted to reproduce? If not, what part(s) were you unable to reproduce?

All of the means for each of the conditions replicated; however, none of the confidence intervals of the means replicated exactly (but they were very close). For the most part, the results from the ANOVA model replicated. The ηp2 = .07 value was not quite the same since the value I got (ηp2 = .07604) should round up to 0.08. Otherwise, the F(1,69) = 5.16 and p < 0.05 values were the same.

How difficult was it to reproduce your results?

It was not too difficult once I made sense of the dataset structure.

What aspects made it difficult? What aspects made it easy?

Since there was no data dictionary with explanations about the columns, it was hard to determine what the different conditions meant, so I had to calculate the means to get a sense of which values for Slack and Large corresponded to which conditions. Having the data already in a tidy format made it easier to interpret.