For this exercise, please try to reproduce the results from Experiment 6 of the associated paper (Shah, Shafir, & Mullainathan, 2015). The PDF of the paper is included in the same folder as this Rmd file.

Methods summary:

The authors were interested in the effect of scarcity on people’s consistency of valuation judgments. In this study, participants played a game of Family Feud and were given either 75 s (budget - “poor” condition) or 250 s (budget - “rich” condition) to complete the game. After playing the game, participants were either primed to think about a small account of time necessary to play one round of the game (account -“small” condition) or a large account (their overall time budget to play the entire game, account - “large” condition.) Participants rated how costly it would feel to lose 10s of time to play the game. The researchers were primarily interested in an interaction between the between-subjects factors of scarcity and account, hypothesizing that those in the budget - “poor” condition would be more consistent in their valuation of the 10s regardless of account in comparison with those in the budget - “rich” condition. The authors tested this hypothesis with a 2x2 between-subjects ANOVA.

Target outcomes:

Below is the specific result you will attempt to reproduce (quoted directly from the results section of Experiment 6):

“One participant was excluded because of a computer malfunction during the game. Time-rich participants rated the loss as more expensive when they thought about a small account (M = 8.31, 95% CI = [7.78, 8.84]) than when they thought about a large account (M = 6.50, 95% CI = [5.42, 7.58]), whereas time-poor participants’ evaluations did not differ between the small-account condition (M = 8.33, 95% CI = [7.14, 9.52]) and the large account condition (M = 8.83, 95% CI = [7.97, 9.69]). A 2 (scarcity condition) × 2 (account condition) analysis of variance revealed a significant interaction, F(1, 69) = 5.16, p < .05, ηp2 = .07.” (Shah, Shafir & Mullainathan, 2015) ——

Step 1: Load packages

library(tidyverse) # for data munging
library(knitr) # for kable table formating
library(haven) # import and export 'SPSS', 'Stata' and 'SAS' Files
library(readxl) # import excel files

# #optional packages:
library(afex) #anova functions
#library(langcog) #95 percent confidence intervals

Step 2: Load data

# Just Experiment 6
data <- read_excel("data/study 6-accessible-feud.xlsx")

Step 3: Tidy data

The data are already tidy as provided by the authors.

Step 4: Run analysis

Pre-processing

One participant was excluded because of a computer malfunction during the game (Shah, Shafir, & Mullainathan, 2015, p. 408)

Note: The original paper does not identify the participant that was excluded, but it was later revealed through communication with the authors that it was participant #16. The exclusion is performed below.

# Participant #16 should be dropped from analysis 
excluded <- "16"

d <- data %>%
  filter(!Subject %in% excluded) #participant exclusions

Descriptive statistics

Time-rich participants rated the loss as more expensive when they thought about a small account (M = 8.31, 95% CI = [7.78, 8.84]) than when they thought about a large account (M = 6.50, 95% CI = [5.42, 7.58]), whereas time-poor participants’ evaluations did not differ between the small-account condition (M = 8.33, 95% CI = [7.14, 9.52]) and the large- account condition (M = 8.83, 95% CI = [7.97, 9.69]). (Shah, Shafir, & Mullainathan, 2015, p. 408)

# reproduce the above results here
descriptive_stats <- d %>%
        group_by(Slack, Large) %>% 
        summarize(ci_bootstrapped = mean_cl_boot(expense, conf = 0.95),
                  ci_normal = mean_cl_normal(expense, conf = 0.95)) %>% 
        # pull out the mean, lower and upper bounds of CI
        mutate(mean = ci_normal[1],
               ci_normal_lower = ci_normal[2],
               ci_normal_upper = ci_normal[3],
               ci_bootstrapped_lower = ci_bootstrapped[2],
               ci_bootstrapped_upper = ci_bootstrapped[3]) %>% 
        # remove redundant columns
        select(-c("ci_normal", "ci_bootstrapped"))

kable(descriptive_stats)

Slack	Large	mean	ci_normal_lower	ci_normal_upper	ci_bootstrapped_lower	ci_bootstrapped_upper
0	0	8.333333	7.067489	9.599178	7.047619	9.333333
0	1	8.833333	7.910843	9.755823	8.000000	9.612500
1	0	8.312500	7.737972	8.887028	7.873438	8.812500
1	1	6.500000	5.340009	7.659991	5.500000	7.555556

Inferential statistics

A 2 (scarcity condition) × 2 (account condition) analysis of variance revealed a significant interaction, F(1, 69) = 5.16, p < .05, ηp2 = .07.

# reproduce the above results here
aov_ez(id = "Subject",
       dv = "expense",
       between = c("Slack", "Large"),
       data = d)

## Anova Table (Type 3 tests)
## 
## Response: expense
##        Effect    df  MSE      F  ges p.value
## 1       Slack 1, 69 4.68 5.35 * .072    .024
## 2       Large 1, 69 4.68   1.66 .024    .202
## 3 Slack:Large 1, 69 4.68 5.16 * .070    .026
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1

Step 5: Reflection

Were you able to reproduce the results you attempted to reproduce? If not, what part(s) were you unable to reproduce?

I generally was able to reproduce the results! The only thing I couldn’t resolve were the CIs. I got the same mean values, but the margins of error were slightly off. I tried to find more info on these and how they were calculated but was unsuccessful. I tried a normal CI and a bootstrapped CI and neither matched exactly with what the paper said. Maybe this is just some kind of a rounding difference thing?

How difficult was it to reproduce your results?

It was generally not very difficult. Since reading in the data and preprocessing it was already taken care of, that spead things up a lot! Had I also tried to figure out dealing with the SPSS/Excel files, figuring out which participant was removed, or getting the data in tidy format, it would have taken more time.

What aspects made it difficult? What aspects made it easy?

One thing that made it difficult was that I wasn’t immediately clear what the variables in the dataframe were because there was no codebook and the variable names weren’t all the most intuitive. I also got confused by the random text and numbers on the right side of the dataframe – it wasn’t clear to me what these were until I opened the Excel file and saw that they had computed some summary statistics there in the dataframe. But as mentioned in my response to the question above, having the data already read in, preprocessed, and in tidy format made it a lot easier than it would have been otherwise.

Reproducibility Report: Group A Choice 3