For this exercise, please try to reproduce the results from Study 1 of the associated paper (Joel, Teper, & MacDonald, 2014). The PDF of the paper is included in the same folder as this Rmd file.
In study 1, 150 introductory psychology students were randomly assigned to a “real” or a “hypothetical” condition. In the real condition, participants believed that they would have a real opportuniy to connect with potential romantic partners. In the hypothetical condition, participants simply imagined that they are on a date. All participants were required to select their favorite profile and answer whether they were willing to exchange contact information.
Below is the specific result you will attempt to reproduce (quoted directly from the results section of Study 1):
We next tested our primary hypothesis that participants would be more reluctant to reject the unattractive date when they believed the situation to be real rather than hypothetical. Only 10 of the 61 participants in the hypothetical condition chose to exchange contact information with the unattractive potential date (16%). In contrast, 26 of the 71 participants in the real condition chose to exchange contact information (37%). A chi-square test of independence indicated that participants were significantly less likely to reject the unattractive potential date in the real condition compared with the hypothetical condition, X^2(1, N = 132) = 6.77, p = .009.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
init_df = pd.read_spss('data/Empathy Gap Study 1 data.sav')
init_df.head()
| ID | attachment1 | attachment2 | attachment3 | attachment4 | attachment5 | attachment6 | attachment7 | attachment8 | attachment9 | attachment10 | attachment11 | attachment12 | attachment13 | attachment14 | attachment15 | attachment16 | attachment17 | attachment18 | attachment19 | attachment20 | attachment21 | attachment22 | attachment23 | attachment24 | attachment25 | attachment26 | attachment27 | attachment28 | attachment29 | attachment30 | attachment31 | attachment32 | attachment33 | attachment34 | attachment35 | attachment36 | FOBA1 | FOBA2 | FOBA3 | FOBA4 | FOBA5 | FOBA6 | empathy1 | empathy2 | empathy3 | empathy4 | empathy5 | empathy6 | empathy7 | empathy8 | empathy9 | empathy10 | empathy11 | empathy12 | empathy13 | empathy14 | empathy15 | empathy16 | empathy17 | empathy18 | empathy19 | empathy20 | empathy21 | empathy22 | empathy23 | empathy24 | empathy25 | empathy26 | empathy27 | empathy28 | age | livedincanada | orientation | inrel | longterm | dating | shortterm | intimate | otheropen | drink | children | responseq1 | responseq2 | responseq3 | responseq4 | reasontrue1 | motives1 | reasontrue2 | motives2 | reasontrue3 | motives3 | reasontrue4 | motives4 | reasontrue5 | motives5 | reasontrue6 | motives6 | reasontrue7 | motives7 | reasontrue8 | motives8 | suspicious | selfattractive | otherattractive | EmpathyPTtot | EmpathyFStot | EmpathyECtot | EmpathyPDtot | fobstot | attachmentavoidance | attachmentanxiety | stateguilttot | stateempathytot | excitementtot | compatibilitytot | very_otherfocused | less_otherfocused | gender | genderXcondition | REQUIRED_VARIABLES_START_BELOW | condition | exchangeinfo | otherfocused_motives | selffocused_motives | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 53.0 | 3.0 | 4.0 | 5.0 | 3.0 | 2.0 | 3.0 | 5.0 | 2.0 | 4.0 | 3.0 | 5.0 | 5.0 | 5.0 | 2.0 | 4.0 | 3.0 | 3.0 | 2.0 | 5.0 | 3.0 | 4.0 | 3.0 | 3.0 | 4.0 | 3.0 | 4.0 | 3.0 | 5.0 | 3.0 | 5.0 | 3.0 | 5.0 | 2.0 | 2.0 | 2.0 | 5.0 | 2.0 | 1.0 | 1.0 | 3.0 | 1.0 | 2.0 | 2.0 | 5.0 | 1.0 | 2.0 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 3.0 | 4.0 | 3.0 | 1.0 | 4.0 | 4.0 | 1.0 | 1.0 | 1.0 | 1.0 | 5.0 | 1.0 | 2.0 | 4.0 | 4.0 | 3.0 | 4.0 | 18.0 | 3.0 | 1.0 | 2.0 | 2.0 | 1.0 | 1.0 | 2.0 | 2.0 | 4.0 | 1.0 | 4.0 | 4.0 | 4.0 | 4.0 | 3.0 | 3.0 | 4.0 | 3.0 | 4.0 | 4.0 | 4.0 | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 4.0 | 3.0 | 4.0 | 4.0 | NaN | 5.0 | 6.0 | 3.857143 | 2.857143 | 3.857143 | 3.428571 | 1.666667 | 3.555556 | 3.611111 | 3.75 | 3.25 | 3.75 | 3.000000 | 3.25 | 3.75 | women | 1.0 | NaN | real | yes | 3.5 | 3.375 |
| 1 | 93.0 | 5.0 | 1.0 | 3.0 | 4.0 | 2.0 | 2.0 | 2.0 | 2.0 | 5.0 | 2.0 | 3.0 | 3.0 | 3.0 | 2.0 | 3.0 | 2.0 | 3.0 | 3.0 | 3.0 | 3.0 | NaN | NaN | 2.0 | NaN | NaN | 4.0 | 5.0 | 4.0 | NaN | 3.0 | NaN | NaN | 7.0 | NaN | NaN | 5.0 | 1.0 | 1.0 | 1.0 | 1.0 | 4.0 | 5.0 | 3.0 | 3.0 | 4.0 | 2.0 | 3.0 | NaN | 5.0 | NaN | 5.0 | 4.0 | NaN | 2.0 | 1.0 | 1.0 | 4.0 | 4.0 | NaN | 1.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 4.0 | NaN | 2.0 | 18.0 | 4.0 | 0.0 | 2.0 | 2.0 | 2.0 | 1.0 | 2.0 | 2.0 | 2.0 | 4.0 | 2.0 | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 4.0 | NaN | 2.0 | NaN | 2.0 | NaN | 2.0 | NaN | 2.0 | NaN | 2.0 | NaN | 2.0 | NaN | 2.0 | 8.0 | 5.0 | 2.000000 | 3.166667 | 4.400000 | 4.500000 | 2.166667 | 3.857143 | 2.923077 | 2.00 | 3.00 | 2.00 | 2.666667 | 3.00 | 2.00 | men | 0.0 | NaN | real | no | 2.5 | 2.400 |
| 2 | 83.0 | 3.0 | 6.0 | 3.0 | 6.0 | 5.0 | 4.0 | 2.0 | 2.0 | 3.0 | 6.0 | 5.0 | 6.0 | 2.0 | 3.0 | 5.0 | 3.0 | 6.0 | 5.0 | 5.0 | 2.0 | 3.0 | 3.0 | 3.0 | 3.0 | 5.0 | 6.0 | 5.0 | 3.0 | 3.0 | 6.0 | 2.0 | 5.0 | 5.0 | 3.0 | 5.0 | 3.0 | 5.0 | 5.0 | 5.0 | 5.0 | 5.0 | 5.0 | 4.0 | 3.0 | 1.0 | 2.0 | 5.0 | 5.0 | 2.0 | 2.0 | 3.0 | 5.0 | 4.0 | 1.0 | 2.0 | 1.0 | 2.0 | 4.0 | 5.0 | 2.0 | 2.0 | 4.0 | 3.0 | 4.0 | 4.0 | 4.0 | 3.0 | 3.0 | 4.0 | 2.0 | 20.0 | 4.0 | 1.0 | 2.0 | 1.0 | 1.0 | 2.0 | 2.0 | 2.0 | 2.0 | 2.0 | 3.0 | 4.0 | 4.0 | 4.0 | 3.0 | 2.0 | 5.0 | 4.0 | 3.0 | 4.0 | 5.0 | 4.0 | 4.0 | 3.0 | 5.0 | 4.0 | 2.0 | 1.0 | 5.0 | 4.0 | 2.0 | 4.0 | 4.0 | 3.285714 | 4.142857 | 3.857143 | 4.428571 | 5.000000 | 4.333333 | 3.944444 | 4.50 | 4.50 | 2.50 | 3.000000 | 4.50 | 4.50 | men | 0.0 | NaN | real | no | 4.5 | 2.750 |
| 3 | 27.0 | 2.0 | 6.0 | 5.0 | 2.0 | 5.0 | 5.0 | 6.0 | 2.0 | 2.0 | 2.0 | 5.0 | 4.0 | 1.0 | 4.0 | 3.0 | 2.0 | 4.0 | 3.0 | 5.0 | 2.0 | 6.0 | 3.0 | 3.0 | 5.0 | 4.0 | 1.0 | 2.0 | 2.0 | 2.0 | 5.0 | 2.0 | 3.0 | 5.0 | 2.0 | 4.0 | 3.0 | 2.0 | 1.0 | 3.0 | 5.0 | 3.0 | 4.0 | 4.0 | 5.0 | 3.0 | 1.0 | 4.0 | 5.0 | 1.0 | 3.0 | 5.0 | 5.0 | 4.0 | 1.0 | 2.0 | 1.0 | 3.0 | 4.0 | 5.0 | 1.0 | 2.0 | 5.0 | 4.0 | 5.0 | 4.0 | 2.0 | 3.0 | 5.0 | 3.0 | 3.0 | 18.0 | 4.0 | 0.0 | 2.0 | 1.0 | 1.0 | 2.0 | 2.0 | 2.0 | 1.0 | 1.0 | 1.0 | 2.0 | 2.0 | 3.0 | 2.0 | 1.0 | 1.0 | 1.0 | 2.0 | 2.0 | 1.0 | 1.0 | 2.0 | 3.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 2.0 | NaN | NaN | 3.285714 | 4.428571 | 5.000000 | 4.000000 | 3.000000 | 4.000000 | 3.000000 | 1.00 | 1.00 | 1.50 | 2.000000 | 1.00 | 1.00 | women | 0.0 | NaN | hypothetical | no | 1.0 | 1.750 |
| 4 | 6.0 | 3.0 | 6.0 | 3.0 | 5.0 | 5.0 | 5.0 | 1.0 | 5.0 | 3.0 | 3.0 | 5.0 | 5.0 | 3.0 | 5.0 | 5.0 | 3.0 | 5.0 | 3.0 | 3.0 | 3.0 | 4.0 | 3.0 | 5.0 | 6.0 | 5.0 | 5.0 | 5.0 | 3.0 | 5.0 | 5.0 | 3.0 | 5.0 | 3.0 | 5.0 | 2.0 | 2.0 | 4.0 | 1.0 | 1.0 | 5.0 | 1.0 | 1.0 | 4.0 | 4.0 | 2.0 | 2.0 | 5.0 | 4.0 | 2.0 | 3.0 | 4.0 | 4.0 | 4.0 | 2.0 | 2.0 | 3.0 | 2.0 | 4.0 | 4.0 | 2.0 | 2.0 | 4.0 | 4.0 | 5.0 | 4.0 | 4.0 | 3.0 | 4.0 | 3.0 | 4.0 | 19.0 | 1.0 | 0.0 | 2.0 | 2.0 | 2.0 | 1.0 | 2.0 | 2.0 | 1.0 | 1.0 | 5.0 | 4.0 | 3.0 | 4.0 | 3.0 | 3.0 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 3.0 | 3.0 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 2.0 | NaN | NaN | 3.714286 | 4.142857 | 4.000000 | 3.857143 | 2.166667 | 4.944444 | 4.555556 | 4.00 | 4.00 | 4.00 | 3.000000 | 4.00 | 4.00 | women | 0.0 | NaN | hypothetical | yes | 4.0 | 3.500 |
df = init_df.copy()
# Make is_real Column
df.loc[:,'is_real'] = df.apply(lambda x: x.condition=="real", axis=1)
# Make better exchanged column
df["exchanged_digits"] = df.apply(lambda x: x.exchangeinfo=="yes", axis=1)
# Remove nulls:
df = df.loc[~df["is_real"].isnull()]
df = df.loc[~df["exchanged_digits"].isnull()]
Only 10 of the 61 participants in the hypothetical condition chose to exchange contact information with the unattractive potential date (16%). In contrast, 26 of the 71 participants in the real condition chose to exchange contact information (37%).
hypo_df = df.loc[~df["is_real"]]
n_hypo = len(hypo_df)
n_hypo_exchangers = len(hypo_df.loc[hypo_df['exchanged_digits']])
print("Total hypothetical, non-null participants:", n_hypo)
print("Number of hypothetical exchangers:", n_hypo_exchangers)
print("Portion of Hypothetical that Exchanged: {:2.0f}%".format(n_hypo_exchangers/n_hypo*100))
Total hypothetical participants: 61
Number of hypothetical exchangers: 10
Portion of Hypothetical that Exchanged: 16%
real_df = df.loc[df['is_real']]
n_real = len(real_df)
n_real_exchangers = len(real_df.loc[real_df["exchanged_digits"]])
print("Total real, non-null participants:", n_real)
print("Number of real exchangers:", n_real_exchangers)
print("Portion of Real that Exchanged those Digs: {:2.0f}%".format(n_real_exchangers/n_real*100))
Total real participants: 71
Number of real exchangers: 26
Portion of Real that Exchanged those Digs: 37%
A chi-square test of independence indicated that participants were significantly less likely to reject the unattractive potential date in the real condition compared with the hypothetical condition, X^2(1, N = 132) = 6.77, p = .009.
n_exchangers = n_real_exchangers + n_hypo_exchangers
n_assholes = len(df) - n_exchangers
assert n_exchangers+n_assholes == n_real+n_hypo
observed_hypo_exchangers = n_hypo_exchangers
observed_hypo_assholes = n_hypo-n_hypo_exchangers
observed_real_exchangers = n_real_exchangers
observed_real_assholes = n_real - n_real_exchangers
fxn = lambda x,y: (x*y)/(x+y)
expected_hypo_exchangers = fxn(n_exchangers,n_hypo)
expected_hypo_assholes = fxn(n_assholes,n_hypo)
expected_real_exchangers = fxn(n_exchangers,n_real)
expected_real_assholes = fxn(n_assholes,n_real)
fxn = lambda x,y: (x-y)**2/y
he_dev = fxn(observed_hypo_exchangers,expected_hypo_exchangers)
ha_dev = fxn(observed_hypo_assholes,expected_hypo_assholes)
re_dev = fxn(observed_real_exchangers,expected_real_exchangers)
ra_dev = fxn(observed_real_assholes,expected_real_assholes)
x2 = he_dev + ha_dev + re_dev + ra_dev
print("ChiSquare:", x2) # Used calculator with 1 degree of freedom
ChiSquare: 12.704757641104031
from scipy.stats import chisquare
tup=chisquare(f_obs=[observed_hypo_exchangers,observed_hypo_assholes,observed_real_exchangers,observed_real_assholes],
f_exp=[expected_hypo_exchangers,expected_hypo_assholes,expected_real_exchangers,expected_real_assholes])
x2,p = tup
print("ChiSquare:", x2, "-- p:", p)
ChiSquare: 12.704757641104031 – p: 0.005320599642953536
from bioinfokit.analys import stat
table = {"category":["real", "hypothetical"],
"exchanged":[observed_real_exchangers,observed_hypo_exchangers],
"rejected":[observed_real_assholes,observed_hypo_assholes]}
temp = pd.DataFrame(table)
temp = temp.set_index("category")
temp
| exchanged | rejected | |
|---|---|---|
| category | ||
| real | 26 | 45 |
| hypothetical | 10 | 51 |
res = stat()
res.chisq(df=temp)
print(res.summary)
Chi-squared test for independence
| Test | Df | Chi-square | P-value |
|---|---|---|---|
| Pearson | 1 | 5.78605 | 0.0161539 |
| Log-likelihood | 1 | 5.94695 | 0.0147428 |
Were you able to reproduce the results you attempted to reproduce? If not, what part(s) were you unable to reproduce?
I was able to reproduce the counts and percentages of the different categories, but I was not able to reproduce the chi square results.
How difficult was it to reproduce your results?
The counts were not difficult. The chi square test was, however.
What aspects made it difficult? What aspects made it easy?
I was unfamiliar with the chi square test which made things take a while. I also tried multiple methods as a double check, of which only two results matched with each other. It is suspicious that the last method returned a different value which made everything take a lot longer than it should have.