For this exercise, please try to reproduce the results from Study 1 of the associated paper (Joel, Teper, & MacDonald, 2014). The PDF of the paper is included in the same folder as this Rmd file.

Methods summary:

In study 1, 150 introductory psychology students were randomly assigned to a “real” or a “hypothetical” condition. In the real condition, participants believed that they would have a real opportunity to connect with potential romantic partners. In the hypothetical condition, participants simply imagined that they are on a date. All participants were required to select their favorite profile and answer whether they were willing to exchange contact information.


Target outcomes:

Below is the specific result you will attempt to reproduce (quoted directly from the results section of Study 1):

We next tested our primary hypothesis that participants would be more reluctant to reject the unattractive date when they believed the situation to be real rather than hypothetical. Only 10 of the 61 participants in the hypothetical condition chose to exchange contact information with the unattractive potential date (16%). In contrast, 26 of the 71 participants in the real condition chose to exchange contact information (37%). A chi-square test of independence indicated that participants were significantly less likely to reject the unattractive potential date in the real condition compared with the hypothetical condition, X^2(1, N = 132) = 6.77, p = .009.


Step 1: Load packages

library(tidyverse) # for data munging
library(knitr) # for kable table formating
library(haven) # import and export 'SPSS', 'Stata' and 'SAS' Files
library(readxl) # import excel files

# #optional packages:
# library(broom)
library(labelled)# converts SPSS's labelled to R's factor 

Step 2: Load data

# Just Study 1
d <- read_sav('data/Empathy Gap Study 1 data.sav')

Most of this is random stuff I don’t care about, want exchangeinfo & condition.

Step 3: Tidy data

d_tidy <- d %>% select(exchangeinfo, condition) %>% 
  mutate(exchange.num=exchangeinfo,
         cond.num=condition,
    exchangeinfo=ifelse(exchangeinfo==1, "yes","no"),
         condition=ifelse(condition==1, "real", "hypothetical"))

Step 4: Run analysis

Descriptive statistics

Only 10 of the 61 participants in the hypothetical condition chose to exchange contact information with the unattractive potential date (16%). In contrast, 26 of the 71 participants in the real condition chose to exchange contact information (37%).

# reproduce the above results here

d_summ <- d_tidy %>% group_by(condition, exchangeinfo) %>% tally() %>% ungroup()

kable(d_summ)
condition exchangeinfo n
hypothetical no 51
hypothetical yes 10
real no 45
real yes 26

The numbers check out.

Inferential statistics

A chi-square test of independence indicated that participants were significantly less likely to reject the unattractive potential date in the real condition compared with the hypothetical condition, X^2(1, N = 132) = 6.77, p = .009.

# reproduce the above results here
d_chi <- d_summ %>% pivot_wider(names_from=condition, values_from=n) %>% select(real,hypothetical)
chisq.test(d_chi, correct=F)
## 
##  Pearson's Chi-squared test
## 
## data:  d_chi
## X-squared = 6.7674, df = 1, p-value = 0.009284

Step 5: Reflection

Were you able to reproduce the results you attempted to reproduce? If not, what part(s) were you unable to reproduce?

Yes, all results reproduced.

How difficult was it to reproduce your results?

Not very, but given how simple the set up was, it should have been easier. Some of this was that I didn’t know how to do chi-squared, so had to look up how to set that up (and I tried both condition as columns and condition as rows before realizing that didn’t matter). When I got a different value, had to switch to not using the default yates correction.

What aspects made it difficult? What aspects made it easy?

The easy part was that the pipeline is very simple, I wasn’t having to reapply exclusions or rederive things. The hard part was finding the relevant columns without a data dictionary (mostly due to sheer number of columns), although they were well named at least.