For this exercise, please try to reproduce the results from Study 1 of the associated paper (Joel, Teper, & MacDonald, 2014). The PDF of the paper is included in the same folder as this Rmd file.
In study 1, 150 introductory psychology students were randomly assigned to a “real” or a “hypothetical” condition. In the real condition, participants believed that they would have a real opportuniy to connect with potential romantic partners. In the hypothetical condition, participants simply imagined that they are on a date. All participants were required to select their favorite profile and answer whether they were willing to exchange contact information.
Below is the specific result you will attempt to reproduce (quoted directly from the results section of Study 1):
library(tidyverse) # for data munging
library(knitr) # for kable table formating
library(haven) # import and export 'SPSS', 'Stata' and 'SAS' Files
library(readxl) # import excel files
# #optional packages:
library(broom)
library(labelled)# converts SPSS's labelled to R's factor
# Just Study 1
d <- read_sav("~/R Mac working folder/class/problem_sets/ps3/Group A/Choice 1/data/Empathy Gap Study 1 data.sav")
#create table with key variabbles
dt <- d %>% select(ID, condition, exchangeinfo, otherfocused_motives, selffocused_motives, selfattractive, otherattractive)
#came back with notes and turned these to factors
dt$condition <- to_factor(dt$condition)
dt$exchangeinfo <- to_factor(dt$exchangeinfo)
dt
## # A tibble: 132 x 7
## ID condition exchangeinfo otherfocused_mo… selffocused_mot… selfattractive
## <dbl> <fct> <fct> <dbl> <dbl> <dbl>
## 1 53 real yes 3.5 3.38 5
## 2 93 real no 2.5 2.4 8
## 3 83 real no 4.5 2.75 4
## 4 27 hypothet… no 1 1.75 NA
## 5 6 hypothet… yes 4 3.5 NA
## 6 116 hypothet… yes 4 2.75 7
## 7 24 hypothet… no 4.12 2.62 NA
## 8 127 hypothet… no 2.12 2 9
## 9 32 real yes 5 2.38 3
## 10 73 real no 1.62 1.12 6
## # … with 122 more rows, and 1 more variable: otherattractive <dbl>
Only 10 of the 61 participants in the hypothetical condition chose to exchange contact information with the unattractive potential date (16%). In contrast, 26 of the 71 participants in the real condition chose to exchange contact information (37%).
dt %>% filter(condition == 0, exchangeinfo == 2) %>%
nrow() ##how many people chose not to exchange contact info in the hypothetical condition? 51
## [1] 0
dt %>% filter(condition == 0, exchangeinfo == 1) %>%
nrow() ## 10 chose to exchange
## [1] 0
dt %>% filter(condition == 1, exchangeinfo == 2) %>%
nrow() #how many chose not to exchange in the real? 45
## [1] 0
dt %>% filter(condition == 1, exchangeinfo == 1) %>%
nrow() ##26 chose to exchange
## [1] 0
summary(dt)
## ID condition exchangeinfo otherfocused_motives
## Min. : 1.00 hypothetical:61 yes:36 Min. :1.000
## 1st Qu.: 36.50 real :71 no :96 1st Qu.:2.125
## Median : 74.00 Median :2.875
## Mean : 74.19 Mean :2.891
## 3rd Qu.:110.50 3rd Qu.:3.723
## Max. :153.00 Max. :5.000
## NA's :1
## selffocused_motives selfattractive otherattractive
## Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:1.875 1st Qu.:5.000 1st Qu.:3.000
## Median :2.375 Median :6.000 Median :5.000
## Mean :2.369 Mean :6.078 Mean :4.481
## 3rd Qu.:2.875 3rd Qu.:7.000 3rd Qu.:6.000
## Max. :4.562 Max. :9.000 Max. :9.000
## NA's :29 NA's :29
# reproduce the above results here
A chi-square test of independence indicated that participants were significantly less likely to reject the unattractive potential date in the real condition compared with the hypothetical condition, X^2(1, N = 132) = 6.77, p = .009.
h0-null - no difference relationship between real condition and hypo condition h1
dt2 <- table(dt$condition, dt$exchangeinfo) #contingency table
ggplot(dt) + aes(x = condition, fill = exchangeinfo) +
geom_bar () +
scale_fill_hue()
test <- chisq.test(dt2, correct = FALSE)
test
##
## Pearson's Chi-squared test
##
## data: dt2
## X-squared = 6.7674, df = 1, p-value = 0.009284
head(dt)
## # A tibble: 6 x 7
## ID condition exchangeinfo otherfocused_mo… selffocused_mot… selfattractive
## <dbl> <fct> <fct> <dbl> <dbl> <dbl>
## 1 53 real yes 3.5 3.38 5
## 2 93 real no 2.5 2.4 8
## 3 83 real no 4.5 2.75 4
## 4 27 hypothet… no 1 1.75 NA
## 5 6 hypothet… yes 4 3.5 NA
## 6 116 hypothet… yes 4 2.75 7
## # … with 1 more variable: otherattractive <dbl>
summary(test)
## Length Class Mode
## statistic 1 -none- numeric
## parameter 1 -none- numeric
## p.value 1 -none- numeric
## method 1 -none- character
## data.name 1 -none- character
## observed 4 table numeric
## expected 4 -none- numeric
## residuals 4 table numeric
## stdres 4 table numeric
## reproduce the above results here - we know that the relationship between condition is significantly related to number of people who agree to say yes to exchange information and that the hypothetical (0) condition is associated with less exchange of information - we see the real condition is associated with more people saying yes (1 vs 2) to exchange info, the null hypothesis h0 is that there is no significant difference between condition and exchanging information, which is rejected at w p.016 . as we dig deeper we see that the relationship shows that people in the real condition were less likely to reject exchanging info. - i was having issues with the same exat p valu ebut then i set the correct = FALSE to get the same, which removed the Yates continuty correction
Were you able to reproduce the results you attempted to reproduce? If not, what part(s) were you unable to reproduce?
ANSWER HERE How difficult was it to reproduce your results? I’m not sure how I would grade the difficulty because to some degree I knew what I Wanted to do but had difficutly doing that but eventually made it there. ANSWER HERE What aspects made it difficult? What aspects made it easy? -switching to a mac made it a bit challenging to upload and get used to this workflow. step 2 and 3 were fine assuming it is accurate. im not sure i computed the correct descriptive but putting together the questions with matrixes and filters allowed me to answer specific questions. i wasnt sure if i conducted the right method because the output is different and took time to think through the setup adn interpretion of the dataframe to get at the answer to the primary question. the simple design made it easy to analyse the statistics. ANSWER HERE the paper made it easy to see what test was simportant and which variables were associated as there are other variables that may get confused in this set.