For this exercise, please try to reproduce the results from Experiment 6 of the associated paper (Shah, Shafir, & Mullainathan, 2015). The PDF of the paper is included in the same folder as this Rmd file.

Methods summary:

The authors were interested in the effect of scarcity on people’s consistency of valuation judgments. In this study, participants played a game of Family Feud and were given either 75 s (budget - “poor” condition) or 250 s (budget - “rich” condition) to complete the game. After playing the game, participants were either primed to think about a small account of time necessary to play one round of the game (account -“small” condition) or a large account (their overall time budget to play the entire game, account - “large” condition.) Participants rated how costly it would feel to lose 10s of time to play the game. The researchers were primarily interested in an interaction between the between-subjects factors of scarcity and account, hypothesizing that those in the budget - “poor” condition would be more consistent in their valuation of the 10s regardless of account in comparison with those in the budget - “rich” condition. The authors tested this hypothesis with a 2x2 between-subjects ANOVA.


Target outcomes:

Below is the specific result you will attempt to reproduce (quoted directly from the results section of Experiment 6):

“One participant was excluded because of a computer malfunction during the game. Time-rich participants rated the loss as more expensive when they thought about a small account (M = 8.31, 95% CI = [7.78, 8.84]) than when they thought about a large account (M = 6.50, 95% CI = [5.42, 7.58]), whereas time-poor participants’ evaluations did not differ between the small-account condition (M = 8.33, 95% CI = [7.14, 9.52]) and the large account condition (M = 8.83, 95% CI = [7.97, 9.69]). A 2 (scarcity condition) × 2 (account condition) analysis of variance revealed a significant interaction, F(1, 69) = 5.16, p < .05, ηp2 = .07.” (Shah, Shafir & Mullainathan, 2015) ——

Step 1: Load packages

library(tidyverse) # for data munging
library(knitr) # for kable table formating
library(haven) # import and export 'SPSS', 'Stata' and 'SAS' Files
library(readxl) # import excel files

# #optional packages:
# library(afex) #anova functions
# library(langcog) #95 percent confidence intervals

Step 2: Load data

# Just Experiment 6
data <- read_excel("data/study 6-accessible-feud.xlsx")

data
## # A tibble: 74 × 14
##    Subject  Cond Slack Large tmest expense error ...8  ...9  ...10   ...11 ...12
##      <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl> <dbl> <lgl> <lgl> <chr>   <chr> <dbl>
##  1       6     0     0     0    15      10 0     NA    NA    <NA>    <NA>  NA   
##  2      10     0     0     0    15       7 0     NA    NA    <NA>    <NA>  NA   
##  3      18     0     0     0    15      11 0     NA    NA    Averag… Colu… NA   
##  4      22     0     0     0     7       9 0.533 NA    NA    Row La… 0      1   
##  5      26     0     0     0    15       4 0     NA    NA    0       8.33…  8.95
##  6      34     0     0     0    15      11 0     NA    NA    1       8.31…  6.5 
##  7      38     0     0     0    15       5 0     NA    NA    (blank) <NA>  NA   
##  8      42     0     0     0    15       9 0     NA    NA    Grand … 8.32…  7.76
##  9      46     0     0     0    10       6 0.333 NA    NA    <NA>    <NA>  NA   
## 10      50     0     0     0    40      10 1.67  NA    NA    <NA>    <NA>  NA   
## # … with 64 more rows, and 2 more variables: ...13 <chr>, ...14 <chr>

Step 3: Tidy data

The data are already tidy as provided by the authors.

Step 4: Run analysis

Pre-processing

One participant was excluded because of a computer malfunction during the game (Shah, Shafir, & Mullainathan, 2015, p. 408)

Note: The original paper does not identify the participant that was excluded, but it was later revealed through communication with the authors that it was participant #16. The exclusion is performed below.

# Participant #16 should be dropped from analysis 
excluded <- "16"

d <- data %>%
  filter(!Subject %in% excluded) #participant exclusions

Descriptive statistics

Time-rich participants rated the loss as more expensive when they thought about a small account (M = 8.31, 95% CI = [7.78, 8.84]) than when they thought about a large account (M = 6.50, 95% CI = [5.42, 7.58]), whereas time-poor participants’ evaluations did not differ between the small-account condition (M = 8.33, 95% CI = [7.14, 9.52]) and the large- account condition (M = 8.83, 95% CI = [7.97, 9.69]). (Shah, Shafir, & Mullainathan, 2015, p. 408)

# reproduce the above results here
d %>% 
  group_by(Cond) %>% 
  summarize(mean(expense), sd(expense))
## # A tibble: 4 × 3
##    Cond `mean(expense)` `sd(expense)`
##   <dbl>           <dbl>         <dbl>
## 1     0            8.33          2.78
## 2     1            8.31          1.08
## 3     2            8.83          1.86
## 4     3            6.5           2.33
#time-rich small amount: 1
#time-rich large amount: 3
#time-poor small amount: 0
#time-poor large amount: 2

Inferential statistics

A 2 (scarcity condition) × 2 (account condition) analysis of variance revealed a significant interaction, F(1, 69) = 5.16, p < .05, ηp2 = .07.

# reproduce the above results here

#time-rich small amount: 1
#time-rich large amount: 3
#time-poor small amount: 0
#time-poor large amount: 2

sacarcity_1 <- data %>% select(Cond)
sacarcity_1
## # A tibble: 74 × 1
##     Cond
##    <dbl>
##  1     0
##  2     0
##  3     0
##  4     0
##  5     0
##  6     0
##  7     0
##  8     0
##  9     0
## 10     0
## # … with 64 more rows
infer_stats <- aov(Cond ~ expense, data = data)
summary(infer_stats)
##             Df Sum Sq Mean Sq F value Pr(>F)  
## expense      1   5.06   5.064   3.993 0.0495 *
## Residuals   72  91.31   1.268                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Step 5: Reflection

Were you able to reproduce the results you attempted to reproduce? If not, what part(s) were you unable to reproduce?

I found the correct means but couldn’t figure out how to get the CIs. I also got a barely signifiant p-value (p=0.0495) when running the anova.

How difficult was it to reproduce your results?

It was more difficult than it should have been. I am not very good at R and am still getting a hang of tidyverse. I miss python ):

What aspects made it difficult? What aspects made it easy?

lack of tidyverse skill and poorly labled condition columns.