Group A Choice 1 Reproducibility Report

Satchel Grant, 10/25/2020

For this exercise, please try to reproduce the results from Study 1 of the associated paper (Joel, Teper, & MacDonald, 2014). The PDF of the paper is included in the same folder as this Rmd file.

Methods summary:

In study 1, 150 introductory psychology students were randomly assigned to a “real” or a “hypothetical” condition. In the real condition, participants believed that they would have a real opportuniy to connect with potential romantic partners. In the hypothetical condition, participants simply imagined that they are on a date. All participants were required to select their favorite profile and answer whether they were willing to exchange contact information.


Target outcomes:

Below is the specific result you will attempt to reproduce (quoted directly from the results section of Study 1):

We next tested our primary hypothesis that participants would be more reluctant to reject the unattractive date when they believed the situation to be real rather than hypothetical. Only 10 of the 61 participants in the hypothetical condition chose to exchange contact information with the unattractive potential date (16%). In contrast, 26 of the 71 participants in the real condition chose to exchange contact information (37%). A chi-square test of independence indicated that participants were significantly less likely to reject the unattractive potential date in the real condition compared with the hypothetical condition, X^2(1, N = 132) = 6.77, p = .009.


Step 1: Load libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)

Step 2: Load data

init_df = pd.read_spss('data/Empathy Gap Study 1 data.sav')
init_df.head()
ID attachment1 attachment2 attachment3 attachment4 attachment5 attachment6 attachment7 attachment8 attachment9 attachment10 attachment11 attachment12 attachment13 attachment14 attachment15 attachment16 attachment17 attachment18 attachment19 attachment20 attachment21 attachment22 attachment23 attachment24 attachment25 attachment26 attachment27 attachment28 attachment29 attachment30 attachment31 attachment32 attachment33 attachment34 attachment35 attachment36 FOBA1 FOBA2 FOBA3 FOBA4 FOBA5 FOBA6 empathy1 empathy2 empathy3 empathy4 empathy5 empathy6 empathy7 empathy8 empathy9 empathy10 empathy11 empathy12 empathy13 empathy14 empathy15 empathy16 empathy17 empathy18 empathy19 empathy20 empathy21 empathy22 empathy23 empathy24 empathy25 empathy26 empathy27 empathy28 age livedincanada orientation inrel longterm dating shortterm intimate otheropen drink children responseq1 responseq2 responseq3 responseq4 reasontrue1 motives1 reasontrue2 motives2 reasontrue3 motives3 reasontrue4 motives4 reasontrue5 motives5 reasontrue6 motives6 reasontrue7 motives7 reasontrue8 motives8 suspicious selfattractive otherattractive EmpathyPTtot EmpathyFStot EmpathyECtot EmpathyPDtot fobstot attachmentavoidance attachmentanxiety stateguilttot stateempathytot excitementtot compatibilitytot very_otherfocused less_otherfocused gender genderXcondition REQUIRED_VARIABLES_START_BELOW condition exchangeinfo otherfocused_motives selffocused_motives
0 53.0 3.0 4.0 5.0 3.0 2.0 3.0 5.0 2.0 4.0 3.0 5.0 5.0 5.0 2.0 4.0 3.0 3.0 2.0 5.0 3.0 4.0 3.0 3.0 4.0 3.0 4.0 3.0 5.0 3.0 5.0 3.0 5.0 2.0 2.0 2.0 5.0 2.0 1.0 1.0 3.0 1.0 2.0 2.0 5.0 1.0 2.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 3.0 4.0 3.0 1.0 4.0 4.0 1.0 1.0 1.0 1.0 5.0 1.0 2.0 4.0 4.0 3.0 4.0 18.0 3.0 1.0 2.0 2.0 1.0 1.0 2.0 2.0 4.0 1.0 4.0 4.0 4.0 4.0 3.0 3.0 4.0 3.0 4.0 4.0 4.0 3.0 3.0 3.0 3.0 3.0 4.0 3.0 4.0 4.0 NaN 5.0 6.0 3.857143 2.857143 3.857143 3.428571 1.666667 3.555556 3.611111 3.75 3.25 3.75 3.000000 3.25 3.75 women 1.0 NaN real yes 3.5 3.375
1 93.0 5.0 1.0 3.0 4.0 2.0 2.0 2.0 2.0 5.0 2.0 3.0 3.0 3.0 2.0 3.0 2.0 3.0 3.0 3.0 3.0 NaN NaN 2.0 NaN NaN 4.0 5.0 4.0 NaN 3.0 NaN NaN 7.0 NaN NaN 5.0 1.0 1.0 1.0 1.0 4.0 5.0 3.0 3.0 4.0 2.0 3.0 NaN 5.0 NaN 5.0 4.0 NaN 2.0 1.0 1.0 4.0 4.0 NaN 1.0 NaN NaN NaN NaN NaN NaN NaN 4.0 NaN 2.0 18.0 4.0 0.0 2.0 2.0 2.0 1.0 2.0 2.0 2.0 4.0 2.0 3.0 3.0 3.0 3.0 3.0 4.0 NaN 2.0 NaN 2.0 NaN 2.0 NaN 2.0 NaN 2.0 NaN 2.0 NaN 2.0 8.0 5.0 2.000000 3.166667 4.400000 4.500000 2.166667 3.857143 2.923077 2.00 3.00 2.00 2.666667 3.00 2.00 men 0.0 NaN real no 2.5 2.400
2 83.0 3.0 6.0 3.0 6.0 5.0 4.0 2.0 2.0 3.0 6.0 5.0 6.0 2.0 3.0 5.0 3.0 6.0 5.0 5.0 2.0 3.0 3.0 3.0 3.0 5.0 6.0 5.0 3.0 3.0 6.0 2.0 5.0 5.0 3.0 5.0 3.0 5.0 5.0 5.0 5.0 5.0 5.0 4.0 3.0 1.0 2.0 5.0 5.0 2.0 2.0 3.0 5.0 4.0 1.0 2.0 1.0 2.0 4.0 5.0 2.0 2.0 4.0 3.0 4.0 4.0 4.0 3.0 3.0 4.0 2.0 20.0 4.0 1.0 2.0 1.0 1.0 2.0 2.0 2.0 2.0 2.0 3.0 4.0 4.0 4.0 3.0 2.0 5.0 4.0 3.0 4.0 5.0 4.0 4.0 3.0 5.0 4.0 2.0 1.0 5.0 4.0 2.0 4.0 4.0 3.285714 4.142857 3.857143 4.428571 5.000000 4.333333 3.944444 4.50 4.50 2.50 3.000000 4.50 4.50 men 0.0 NaN real no 4.5 2.750
3 27.0 2.0 6.0 5.0 2.0 5.0 5.0 6.0 2.0 2.0 2.0 5.0 4.0 1.0 4.0 3.0 2.0 4.0 3.0 5.0 2.0 6.0 3.0 3.0 5.0 4.0 1.0 2.0 2.0 2.0 5.0 2.0 3.0 5.0 2.0 4.0 3.0 2.0 1.0 3.0 5.0 3.0 4.0 4.0 5.0 3.0 1.0 4.0 5.0 1.0 3.0 5.0 5.0 4.0 1.0 2.0 1.0 3.0 4.0 5.0 1.0 2.0 5.0 4.0 5.0 4.0 2.0 3.0 5.0 3.0 3.0 18.0 4.0 0.0 2.0 1.0 1.0 2.0 2.0 2.0 1.0 1.0 1.0 2.0 2.0 3.0 2.0 1.0 1.0 1.0 2.0 2.0 1.0 1.0 2.0 3.0 1.0 1.0 1.0 1.0 1.0 1.0 2.0 NaN NaN 3.285714 4.428571 5.000000 4.000000 3.000000 4.000000 3.000000 1.00 1.00 1.50 2.000000 1.00 1.00 women 0.0 NaN hypothetical no 1.0 1.750
4 6.0 3.0 6.0 3.0 5.0 5.0 5.0 1.0 5.0 3.0 3.0 5.0 5.0 3.0 5.0 5.0 3.0 5.0 3.0 3.0 3.0 4.0 3.0 5.0 6.0 5.0 5.0 5.0 3.0 5.0 5.0 3.0 5.0 3.0 5.0 2.0 2.0 4.0 1.0 1.0 5.0 1.0 1.0 4.0 4.0 2.0 2.0 5.0 4.0 2.0 3.0 4.0 4.0 4.0 2.0 2.0 3.0 2.0 4.0 4.0 2.0 2.0 4.0 4.0 5.0 4.0 4.0 3.0 4.0 3.0 4.0 19.0 1.0 0.0 2.0 2.0 2.0 1.0 2.0 2.0 1.0 1.0 5.0 4.0 3.0 4.0 3.0 3.0 4.0 4.0 4.0 4.0 4.0 4.0 3.0 3.0 4.0 4.0 4.0 4.0 4.0 4.0 2.0 NaN NaN 3.714286 4.142857 4.000000 3.857143 2.166667 4.944444 4.555556 4.00 4.00 4.00 3.000000 4.00 4.00 women 0.0 NaN hypothetical yes 4.0 3.500

Step 3: Tidy the data

df = init_df.copy()
# Make is_real Column
df.loc[:,'is_real'] = df.apply(lambda x: x.condition=="real", axis=1)
# Make better exchanged column
df["exchanged_digits"] = df.apply(lambda x: x.exchangeinfo=="yes", axis=1)
# Remove nulls:
df = df.loc[~df["is_real"].isnull()]
df = df.loc[~df["exchanged_digits"].isnull()]

Step 4: Run analysis

Descriptive statistics

Only 10 of the 61 participants in the hypothetical condition chose to exchange contact information with the unattractive potential date (16%). In contrast, 26 of the 71 participants in the real condition chose to exchange contact information (37%).

Hypothetical participants

hypo_df = df.loc[~df["is_real"]]
n_hypo = len(hypo_df)
n_hypo_exchangers = len(hypo_df.loc[hypo_df['exchanged_digits']])
print("Total hypothetical, non-null participants:", n_hypo)
print("Number of hypothetical exchangers:", n_hypo_exchangers)
print("Portion of Hypothetical that Exchanged: {:2.0f}%".format(n_hypo_exchangers/n_hypo*100))

Total hypothetical participants: 61

Number of hypothetical exchangers: 10

Portion of Hypothetical that Exchanged: 16%

Real Participants

real_df = df.loc[df['is_real']]
n_real = len(real_df)
n_real_exchangers = len(real_df.loc[real_df["exchanged_digits"]])
print("Total real, non-null participants:", n_real)
print("Number of real exchangers:", n_real_exchangers)
print("Portion of Real that Exchanged those Digs: {:2.0f}%".format(n_real_exchangers/n_real*100))

Total real participants: 71

Number of real exchangers: 26

Portion of Real that Exchanged those Digs: 37%

Inferential statistics

A chi-square test of independence indicated that participants were significantly less likely to reject the unattractive potential date in the real condition compared with the hypothetical condition, X^2(1, N = 132) = 6.77, p = .009.

In-House Calculation

n_exchangers = n_real_exchangers + n_hypo_exchangers
n_assholes = len(df) - n_exchangers
assert n_exchangers+n_assholes == n_real+n_hypo
observed_hypo_exchangers = n_hypo_exchangers
observed_hypo_assholes = n_hypo-n_hypo_exchangers
observed_real_exchangers = n_real_exchangers
observed_real_assholes = n_real - n_real_exchangers
fxn = lambda x,y: (x*y)/(x+y)
expected_hypo_exchangers = fxn(n_exchangers,n_hypo)
expected_hypo_assholes = fxn(n_assholes,n_hypo)
expected_real_exchangers = fxn(n_exchangers,n_real)
expected_real_assholes = fxn(n_assholes,n_real)
fxn = lambda x,y: (x-y)**2/y
he_dev = fxn(observed_hypo_exchangers,expected_hypo_exchangers)
ha_dev = fxn(observed_hypo_assholes,expected_hypo_assholes)
re_dev = fxn(observed_real_exchangers,expected_real_exchangers)
ra_dev = fxn(observed_real_assholes,expected_real_assholes)
x2 = he_dev + ha_dev + re_dev + ra_dev
print("ChiSquare:", x2) # Used calculator with 1 degree of freedom

ChiSquare: 12.704757641104031

3rd Party Package Calculation

from scipy.stats import chisquare
tup=chisquare(f_obs=[observed_hypo_exchangers,observed_hypo_assholes,observed_real_exchangers,observed_real_assholes],
              f_exp=[expected_hypo_exchangers,expected_hypo_assholes,expected_real_exchangers,expected_real_assholes])
x2,p = tup
print("ChiSquare:", x2, "-- p:", p)

ChiSquare: 12.704757641104031 – p: 0.005320599642953536

Complete 3rd party calculation

from bioinfokit.analys import stat
table = {"category":["real", "hypothetical"],
         "exchanged":[observed_real_exchangers,observed_hypo_exchangers],
         "rejected":[observed_real_assholes,observed_hypo_assholes]}
temp = pd.DataFrame(table)
temp = temp.set_index("category")
temp
exchanged rejected
category
real 26 45
hypothetical 10 51
res = stat()
res.chisq(df=temp)
print(res.summary)

Chi-squared test for independence

Test Df Chi-square P-value
Pearson 1 5.78605 0.0161539
Log-likelihood 1 5.94695 0.0147428

Step 5: Reflection

Were you able to reproduce the results you attempted to reproduce? If not, what part(s) were you unable to reproduce?

I was able to reproduce the counts and percentages of the different categories, but I was not able to reproduce the chi square results.

How difficult was it to reproduce your results?

The counts were not difficult. The chi square test was, however.

What aspects made it difficult? What aspects made it easy?

I was unfamiliar with the chi square test which made things take a while. I also tried multiple methods as a double check, of which only two results matched with each other. It is suspicious that the last method returned a different value which made everything take a lot longer than it should have.