Group A Choice 1 Reproducibility Report

Satchel Grant, 10/25/2020

For this exercise, please try to reproduce the results from Study 1 of the associated paper (Joel, Teper, & MacDonald, 2014). The PDF of the paper is included in the same folder as this Rmd file.

Methods summary:

In study 1, 150 introductory psychology students were randomly assigned to a “real” or a “hypothetical” condition. In the real condition, participants believed that they would have a real opportuniy to connect with potential romantic partners. In the hypothetical condition, participants simply imagined that they are on a date. All participants were required to select their favorite profile and answer whether they were willing to exchange contact information.

Target outcomes:

Below is the specific result you will attempt to reproduce (quoted directly from the results section of Study 1):

We next tested our primary hypothesis that participants would be more reluctant to reject the unattractive date when they believed the situation to be real rather than hypothetical. Only 10 of the 61 participants in the hypothetical condition chose to exchange contact information with the unattractive potential date (16%). In contrast, 26 of the 71 participants in the real condition chose to exchange contact information (37%). A chi-square test of independence indicated that participants were significantly less likely to reject the unattractive potential date in the real condition compared with the hypothetical condition, X^2(1, N = 132) = 6.77, p = .009.

Step 1: Load libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)

Step 2: Load data

init_df = pd.read_spss('data/Empathy Gap Study 1 data.sav')
init_df.head()

	ID	attachment1	attachment2	attachment3	attachment4	attachment5	attachment6	attachment7	attachment8	attachment9	attachment10	attachment11	attachment12	attachment13	attachment14	attachment15	attachment16	attachment17	attachment18	attachment19	attachment20	attachment21	attachment22	attachment23	attachment24	attachment25	attachment26	attachment27	attachment28	attachment29	attachment30	attachment31	attachment32	attachment33	attachment34	attachment35	attachment36	FOBA1	FOBA2	FOBA3	FOBA4	FOBA5	FOBA6	empathy1	empathy2	empathy3	empathy4	empathy5	empathy6	empathy7	empathy8	empathy9	empathy10	empathy11	empathy12	empathy13	empathy14	empathy15	empathy16	empathy17	empathy18	empathy19	empathy20	empathy21	empathy22	empathy23	empathy24	empathy25	empathy26	empathy27	empathy28	age	livedincanada	orientation	inrel	longterm	dating	shortterm	intimate	otheropen	drink	children	responseq1	responseq2	responseq3	responseq4	reasontrue1	motives1	reasontrue2	motives2	reasontrue3	motives3	reasontrue4	motives4	reasontrue5	motives5	reasontrue6	motives6	reasontrue7	motives7	reasontrue8	motives8	suspicious	selfattractive	otherattractive	EmpathyPTtot	EmpathyFStot	EmpathyECtot	EmpathyPDtot	fobstot	attachmentavoidance	attachmentanxiety	stateguilttot	stateempathytot	excitementtot	compatibilitytot	very_otherfocused	less_otherfocused	gender	genderXcondition	REQUIRED_VARIABLES_START_BELOW	condition	exchangeinfo	otherfocused_motives	selffocused_motives
0	53.0	3.0	4.0	5.0	3.0	2.0	3.0	5.0	2.0	4.0	3.0	5.0	5.0	5.0	2.0	4.0	3.0	3.0	2.0	5.0	3.0	4.0	3.0	3.0	4.0	3.0	4.0	3.0	5.0	3.0	5.0	3.0	5.0	2.0	2.0	2.0	5.0	2.0	1.0	1.0	3.0	1.0	2.0	2.0	5.0	1.0	2.0	4.0	4.0	4.0	4.0	4.0	4.0	4.0	3.0	4.0	3.0	1.0	4.0	4.0	1.0	1.0	1.0	1.0	5.0	1.0	2.0	4.0	4.0	3.0	4.0	18.0	3.0	1.0	2.0	2.0	1.0	1.0	2.0	2.0	4.0	1.0	4.0	4.0	4.0	4.0	3.0	3.0	4.0	3.0	4.0	4.0	4.0	3.0	3.0	3.0	3.0	3.0	4.0	3.0	4.0	4.0	NaN	5.0	6.0	3.857143	2.857143	3.857143	3.428571	1.666667	3.555556	3.611111	3.75	3.25	3.75	3.000000	3.25	3.75	women	1.0	NaN	real	yes	3.5	3.375
1	93.0	5.0	1.0	3.0	4.0	2.0	2.0	2.0	2.0	5.0	2.0	3.0	3.0	3.0	2.0	3.0	2.0	3.0	3.0	3.0	3.0	NaN	NaN	2.0	NaN	NaN	4.0	5.0	4.0	NaN	3.0	NaN	NaN	7.0	NaN	NaN	5.0	1.0	1.0	1.0	1.0	4.0	5.0	3.0	3.0	4.0	2.0	3.0	NaN	5.0	NaN	5.0	4.0	NaN	2.0	1.0	1.0	4.0	4.0	NaN	1.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	4.0	NaN	2.0	18.0	4.0	0.0	2.0	2.0	2.0	1.0	2.0	2.0	2.0	4.0	2.0	3.0	3.0	3.0	3.0	3.0	4.0	NaN	2.0	NaN	2.0	NaN	2.0	NaN	2.0	NaN	2.0	NaN	2.0	NaN	2.0	8.0	5.0	2.000000	3.166667	4.400000	4.500000	2.166667	3.857143	2.923077	2.00	3.00	2.00	2.666667	3.00	2.00	men	0.0	NaN	real	no	2.5	2.400
2	83.0	3.0	6.0	3.0	6.0	5.0	4.0	2.0	2.0	3.0	6.0	5.0	6.0	2.0	3.0	5.0	3.0	6.0	5.0	5.0	2.0	3.0	3.0	3.0	3.0	5.0	6.0	5.0	3.0	3.0	6.0	2.0	5.0	5.0	3.0	5.0	3.0	5.0	5.0	5.0	5.0	5.0	5.0	4.0	3.0	1.0	2.0	5.0	5.0	2.0	2.0	3.0	5.0	4.0	1.0	2.0	1.0	2.0	4.0	5.0	2.0	2.0	4.0	3.0	4.0	4.0	4.0	3.0	3.0	4.0	2.0	20.0	4.0	1.0	2.0	1.0	1.0	2.0	2.0	2.0	2.0	2.0	3.0	4.0	4.0	4.0	3.0	2.0	5.0	4.0	3.0	4.0	5.0	4.0	4.0	3.0	5.0	4.0	2.0	1.0	5.0	4.0	2.0	4.0	4.0	3.285714	4.142857	3.857143	4.428571	5.000000	4.333333	3.944444	4.50	4.50	2.50	3.000000	4.50	4.50	men	0.0	NaN	real	no	4.5	2.750
3	27.0	2.0	6.0	5.0	2.0	5.0	5.0	6.0	2.0	2.0	2.0	5.0	4.0	1.0	4.0	3.0	2.0	4.0	3.0	5.0	2.0	6.0	3.0	3.0	5.0	4.0	1.0	2.0	2.0	2.0	5.0	2.0	3.0	5.0	2.0	4.0	3.0	2.0	1.0	3.0	5.0	3.0	4.0	4.0	5.0	3.0	1.0	4.0	5.0	1.0	3.0	5.0	5.0	4.0	1.0	2.0	1.0	3.0	4.0	5.0	1.0	2.0	5.0	4.0	5.0	4.0	2.0	3.0	5.0	3.0	3.0	18.0	4.0	0.0	2.0	1.0	1.0	2.0	2.0	2.0	1.0	1.0	1.0	2.0	2.0	3.0	2.0	1.0	1.0	1.0	2.0	2.0	1.0	1.0	2.0	3.0	1.0	1.0	1.0	1.0	1.0	1.0	2.0	NaN	NaN	3.285714	4.428571	5.000000	4.000000	3.000000	4.000000	3.000000	1.00	1.00	1.50	2.000000	1.00	1.00	women	0.0	NaN	hypothetical	no	1.0	1.750
4	6.0	3.0	6.0	3.0	5.0	5.0	5.0	1.0	5.0	3.0	3.0	5.0	5.0	3.0	5.0	5.0	3.0	5.0	3.0	3.0	3.0	4.0	3.0	5.0	6.0	5.0	5.0	5.0	3.0	5.0	5.0	3.0	5.0	3.0	5.0	2.0	2.0	4.0	1.0	1.0	5.0	1.0	1.0	4.0	4.0	2.0	2.0	5.0	4.0	2.0	3.0	4.0	4.0	4.0	2.0	2.0	3.0	2.0	4.0	4.0	2.0	2.0	4.0	4.0	5.0	4.0	4.0	3.0	4.0	3.0	4.0	19.0	1.0	0.0	2.0	2.0	2.0	1.0	2.0	2.0	1.0	1.0	5.0	4.0	3.0	4.0	3.0	3.0	4.0	4.0	4.0	4.0	4.0	4.0	3.0	3.0	4.0	4.0	4.0	4.0	4.0	4.0	2.0	NaN	NaN	3.714286	4.142857	4.000000	3.857143	2.166667	4.944444	4.555556	4.00	4.00	4.00	3.000000	4.00	4.00	women	0.0	NaN	hypothetical	yes	4.0	3.500

Step 3: Tidy the data

df = init_df.copy()

# Make is_real Column
df.loc[:,'is_real'] = df.apply(lambda x: x.condition=="real", axis=1)
# Make better exchanged column
df["exchanged_digits"] = df.apply(lambda x: x.exchangeinfo=="yes", axis=1)

# Remove nulls:
df = df.loc[~df["is_real"].isnull()]
df = df.loc[~df["exchanged_digits"].isnull()]

Step 4: Run analysis

Descriptive statistics

Only 10 of the 61 participants in the hypothetical condition chose to exchange contact information with the unattractive potential date (16%). In contrast, 26 of the 71 participants in the real condition chose to exchange contact information (37%).

Hypothetical participants

hypo_df = df.loc[~df["is_real"]]
n_hypo = len(hypo_df)
n_hypo_exchangers = len(hypo_df.loc[hypo_df['exchanged_digits']])
print("Total hypothetical, non-null participants:", n_hypo)
print("Number of hypothetical exchangers:", n_hypo_exchangers)
print("Portion of Hypothetical that Exchanged: {:2.0f}%".format(n_hypo_exchangers/n_hypo*100))

Total hypothetical participants: 61

Number of hypothetical exchangers: 10

Portion of Hypothetical that Exchanged: 16%

Real Participants

real_df = df.loc[df['is_real']]
n_real = len(real_df)
n_real_exchangers = len(real_df.loc[real_df["exchanged_digits"]])
print("Total real, non-null participants:", n_real)
print("Number of real exchangers:", n_real_exchangers)
print("Portion of Real that Exchanged those Digs: {:2.0f}%".format(n_real_exchangers/n_real*100))

Total real participants: 71

Number of real exchangers: 26

Portion of Real that Exchanged those Digs: 37%

Inferential statistics

A chi-square test of independence indicated that participants were significantly less likely to reject the unattractive potential date in the real condition compared with the hypothetical condition, X^2(1, N = 132) = 6.77, p = .009.

In-House Calculation

n_exchangers = n_real_exchangers + n_hypo_exchangers
n_assholes = len(df) - n_exchangers
assert n_exchangers+n_assholes == n_real+n_hypo

observed_hypo_exchangers = n_hypo_exchangers
observed_hypo_assholes = n_hypo-n_hypo_exchangers
observed_real_exchangers = n_real_exchangers
observed_real_assholes = n_real - n_real_exchangers

fxn = lambda x,y: (x*y)/(x+y)
expected_hypo_exchangers = fxn(n_exchangers,n_hypo)
expected_hypo_assholes = fxn(n_assholes,n_hypo)
expected_real_exchangers = fxn(n_exchangers,n_real)
expected_real_assholes = fxn(n_assholes,n_real)

fxn = lambda x,y: (x-y)**2/y
he_dev = fxn(observed_hypo_exchangers,expected_hypo_exchangers)
ha_dev = fxn(observed_hypo_assholes,expected_hypo_assholes)
re_dev = fxn(observed_real_exchangers,expected_real_exchangers)
ra_dev = fxn(observed_real_assholes,expected_real_assholes)

x2 = he_dev + ha_dev + re_dev + ra_dev
print("ChiSquare:", x2) # Used calculator with 1 degree of freedom

ChiSquare: 12.704757641104031

3rd Party Package Calculation

from scipy.stats import chisquare

tup=chisquare(f_obs=[observed_hypo_exchangers,observed_hypo_assholes,observed_real_exchangers,observed_real_assholes],
              f_exp=[expected_hypo_exchangers,expected_hypo_assholes,expected_real_exchangers,expected_real_assholes])
x2,p = tup
print("ChiSquare:", x2, "-- p:", p)

ChiSquare: 12.704757641104031 – p: 0.005320599642953536

Complete 3rd party calculation

from bioinfokit.analys import stat

table = {"category":["real", "hypothetical"],
         "exchanged":[observed_real_exchangers,observed_hypo_exchangers],
         "rejected":[observed_real_assholes,observed_hypo_assholes]}
temp = pd.DataFrame(table)
temp = temp.set_index("category")
temp

	exchanged	rejected
category
real	26	45
hypothetical	10	51

res = stat()
res.chisq(df=temp)
print(res.summary)

Chi-squared test for independence

Test	Df	Chi-square	P-value
Pearson	1	5.78605	0.0161539
Log-likelihood	1	5.94695	0.0147428

Step 5: Reflection

Were you able to reproduce the results you attempted to reproduce? If not, what part(s) were you unable to reproduce?

I was able to reproduce the counts and percentages of the different categories, but I was not able to reproduce the chi square results.

How difficult was it to reproduce your results?

The counts were not difficult. The chi square test was, however.

What aspects made it difficult? What aspects made it easy?

I was unfamiliar with the chi square test which made things take a while. I also tried multiple methods as a double check, of which only two results matched with each other. It is suspicious that the last method returned a different value which made everything take a lot longer than it should have.