The original effect size is around 0.286, which is the cohen’s d and which was computed using the t statistic of 2.05 and p = 0.04 found in the original study. It is also important to note that we estimated the degrees of freedom to be 205, given that the total sample size was 413 and thus divided 413 by 2 and then subtracted by 1. The estimated sample sizes at 80%, 90%, and 95% power are 386, 516, and 636 individuals respectively.
The planned sample of the present replication study is 386 individuals who have hiring experience. If the budget allows for further selection of the sample, it would be helpful to further restrict the sample to those who have at least 2 years of work experience.
Here is the link for the materials: https://stanforduniversity.qualtrics.com/jfe/form/SV_6sRpi4Fvh05zJPg.
The resumes of job candidates differing in founding experience and gender used in the present study are downloaded from the original study. The procedure is also exactly the same except one additional questionnaire I ask in the replication study about the respondents’ current job position and industry, because the original study sample consists entirely of hiring managers who have extensive experience hiring for marketing positions.
First, I clean the data – excluding cases that did not have necessary responses on key variable
library(qualtRics)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.4 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 2.0.1 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(ggplot2)
qualtricsData <- read_csv("./qualtricsData.csv")
## Rows: 14 Columns: 47
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (22): Status, IPAddress, ResponseId, DistributionChannel, UserLanguage,...
## dbl (13): Progress, Duration (in seconds), LocationLatitude, LocationLongit...
## lgl (9): Finished, RecipientLastName, RecipientFirstName, RecipientEmail, ...
## dttm (3): StartDate, EndDate, RecordedDate
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
d <- qualtricsData %>%
drop_na(`recommend-scale`)
I then recode the recommendation scale to a numerical scale.
d$'recommend.scale' <- recode(d$`recommend-scale`, 'Definitely\nwill not recommend'= 1,'Not very probably recommend' = 2, 'Probably not recommend' = 3, 'Might or\nmight not recommend' = 4, 'Probably recommend' = 5, 'Very probably recommend' =6, 'Definitely will recommend' = 7)
I then create a variable for the ex-employee resume condition and the ex-founder resume condition, because the survey is a 2 x 2 study.
d = d %>%
select('recommend.scale', starts_with('resumes_randomization'), 'attention-binary-qs') %>%
mutate(employee_condition = ifelse(is.na(d$'resumes_randomization_DO_male-employee')== FALSE |
is.na(d$'resumes_randomization_DO_female-employee')==FALSE,
'employee', 'founder')) %>%
mutate(gender = ifelse((d$'resumes_randomization_DO_female-employee'==1) | (d$'resumes_randomization_DO_female-founder'==1), 'female', 'male')) %>%
mutate(gender = ifelse(is.na(gender), 'male', 'female'))
I exclude cases that do not pass the attention check. We see that one case is eliminated.
d$attention.check <- recode(d$`attention-binary-qs`, 'Yes, an ex-founder'= "founder",'No, not an ex-founder' = 'employee')
d = d %>%
filter(attention.check == employee_condition)
Let’s plot this out!
ggplot(d, aes(x = recommend.scale, fill=gender)) +
geom_bar() +
facet_grid(~employee_condition)
This is another way we could plot the pilot data.
d %>%
group_by(employee_condition, gender) %>%
summarize(meanscore = mean(recommend.scale),
sdscore = sd(recommend.scale)) %>%
ggplot(mapping=aes(x=employee_condition, y=meanscore, #ymin=meanscore-sd, ymax=meanscore+sd,
fill=gender)) +
geom_bar(stat='identity') #+
## `summarise()` has grouped output by 'employee_condition'. You can override using the `.groups` argument.
#geom_errorbar()