Schaerer et al.’s goal of Experiment 1a was to provide preliminary evidence that having an alternative is not always optimal in a negotiation. While previous research argued that having alternatives provided power to the negotiator because they are not completely dependent on the current negotiation, the study aimed to demonstrate that weak alternatives actually lead to worse outcomes than having no alternative offer at all. The mechanism at play is an anchoring effect. The alternatives anchor the negotiator’s first offer. Having a weak alternative causes a negotiator to start too low on the first offer, leading to a worse outcome relative to having a strong, attractive alternative, and more importantly, relative to having no alternative at all.
The target finding of the replication was to find that participants with a weak alternative make lower first offers than participants with either a strong alternative or no alternative at all in a hypothetical negotiation. In addition, a secondary finding was that the participants with no alternative still feel the least powerful of all three conditions.
The original study reports cohen’s ds of .76 or higher for the effect of condition on first offers and ds of .42 for the measure of power. Given that the main DV is the first offer, we will make sure we have enough power to capture this effect.
Importantly, while there are three conditions, each effect size is the simple comparison between two conditions. Thus, we will make sure we have enough power to capture the smallest effect size observed for the first offer, which is the effect size comparing the weak BATNA (Best Alternative to a Negotiated Agreement) vs. no BATNA first offers (t-statistic for dummy coded regression coefficients). Because the standard deviations are different for each group, we estimated how many participants we needed for 80% power using a package that allows for heterogeneous variance.
We inputted the mean difference between the weak BATNA and no BATNA condition and the standard deviations for both conditions for the first offer DV.
n.ttest(power = 0.8, alpha = 0.05, mean.diff = 3.04, sd1 = 5.33, sd2 = 1.74,
design = "unpaired", variance = "unequal")
## Warning in n.ttest(power = 0.8, alpha = 0.05, mean.diff = 3.04, sd1 =
## 5.33, : Arguments -fraction- and -k- are not used, when variances are
## unequal
## $`Total sample size`
## [1] 46
##
## $`Sample size group 1`
## [1] 34
##
## $`sample size group 2`
## [1] 12
The standard deviations are not equal, so the sample sizes for each condition are different, but the results of the power analysis suggest that we will achieve 80% power for the weakest effect size if we use at least 34 people per condition. This seems relatively low, so we decided to use 50 per condition as a conservative test of this effect.
However, just to check, we also ran a power analysis to determine the power of the secondary DV of perceived power. We can use power.t.test in this case, since the variance is homogeneous.
power.t.test(delta=0.5, sd=1, power=.80, sig.level=0.05)
##
## Two-sample t test power calculation
##
## n = 63.76576
## delta = 0.5
## sd = 1
## sig.level = 0.05
## power = 0.8
## alternative = two.sided
##
## NOTE: n is number in *each* group
power.t.test(delta=0.5, sd=1, n = 50, sig.level=0.05)
##
## Two-sample t test power calculation
##
## n = 50
## delta = 0.5
## sd = 1
## sig.level = 0.05
## power = 0.6968888
## alternative = two.sided
##
## NOTE: n is number in *each* group
This secondary analysis has 70% power.
The planned sample size is based on the power analysis. Specifically, we will run 50 per condition plus 5% for exclusions, amounting to 157 (as mentioned below, the authors excluded 6%). There were no preselections as there were none in the original study.
We contacted the authors and got the survey materials with exact wording. See below.
Participants will be randomly assigned to a no BATNA (best alternative to the negotiated agreement), strong BATNA, or weak BATNA condition.
Participants were told:
“Imagine that you just bought an MP3 player and you want to sell some of your old CDs since you no longer need them. One of the CDs you want to sell is the album,”Forty Licks" by The Rolling Stones. Even though the CD is used, it is in reasonably good condition (case intact, no scratches on disc).“” Then, they were assigned to one of three conditions.
“Someone (”the buyer“) just approached you and raised an interest in purchasing your CD. The buyer asks you how much you want for the CD. You also know that nobody else has offered you any money for the CD. Thus, if you can’t reach an agreement in the current negotiation, you won’t get any money for the CD. You are negotiating the price for the CD. What is your first offer to the buyer?”
“Someone (”the buyer“) just approached you and raised an interest in purchasing your CD. The buyer asks you how much you want for the CD.You also know that another buyer has offered you $8 for the CD. Thus, if you can’t reach an agreement in the current negotiation, you will get $8 for the CD. You are negotiating the price for the CD. What is your first offer to the buyer?”
“Immediately after the BATNA manipulation, participants were instructed to make the first offer, which was our key dependent measure. Then, the negotiation was terminated, and participants were asked to indicate the extent to which they felt powerful (1 = powerless, 7 = powerful), in control (1 = no control, 7 = in control), strong (1 = weak, 7 = strong), and confident (1 = unconfident, 7 = confident). Responses to these four items were averaged to create a single measure of perceived power. Finally, participants completed an attention check and provided demographic information.” (cited from original paper). For the attention check, they were asked to check specific boxes at the end of a long paragraph of irrelevant instructions. See experiment for more details.
The procedure was the same as when the authors ran the study on Mturk: Participants first read the manipulation (a hypothetical negotiation situation), and then they answered the dependent variable questions in the same order as the original paper.
As the original authors did, we excluded participants that had a duplicate IP address, incorrectly answered the attention check, and who made first offers with extreme values (> 3 SD from the mean). For the original paper, this amounted to 17/305, or ~6%.
For the first offers, we used dummy coding and regression to compare the mean first offer of the no BATNA condition to the other two conditions. We also used different dummy codes to compare the strong BATNA to the no BATNA condition. For the perceived power (the secondary DV), we aggregated the four items to create a single measure, as the original paper did, and then used regression and dummy coding to compare the no BATNA condition to the other to conditions as well as the weak BATNA condition relative to the strong BATNA condition. In addition, to match the exact analyses of the authors, we also calculated effect sizes and reported confidence intervals for the means of the different conditions.
The key test statistic that we care most about is comparing the weak BATNA first offer to the no BATNA fist offer, showing that the no BATNA first offer is signifciantly higher than the weak BATNA In the original paper, it says that this t statistic is greater than 6.18. Note: this is a t-statistic for dummy coded regression coefficients.
Fortunately, we did not anticipate many differences between this replication and the original paper. Both studies will be run on MTurk with a similar demographic and the materials will be identical to the original study. The only difference is we will only be paying .30 cents as we determined the task only takes around three minutes.
Total sample size was 157. One participant dropped out of the study, leading to 156. Among those 156, there were no duplicate IP addresses, but 13 participants were dropped based on the pre-planned data exclusions (above 3SD or failing the attention check). This is around an 8% drop out rate, similar to the original paper. In the final sample, we had 143 participants with a mean age of 33.12 years and 60 females.
none.
Read in the data from .json files, and format into a dataframe.
path <- "~/Box Sync/mturk/"
files <- dir(paste0(path,"production-results/"),
pattern = "*.json")
d.raw <- data.frame()
for (f in files) {
jf <- paste0(path, "production-results/",f)
jd <- fromJSON(paste(readLines(jf), collapse=""))
id <- data.frame(ip = jd$answer$data$fingerprint[[1]]$answer$ip,
workerid = jd$WorkerId,
cond = as.factor(jd$answer$data$cond),
first_offer = as.numeric(jd$answer$data$first_offer),
power= as.numeric(jd$answer$data$power),
control = as.numeric(jd$answer$data$control),
strong = as.numeric(jd$answer$data$strong),
confident = as.numeric(jd$answer$data$confident),
sex =jd$answer$data$sex,
age = jd$answer$data$age,
nat = jd$answer$data$nat,
lang = jd$answer$data$langu,
eth = jd$answer$data$eth,
att_check= jd$answer$data$AC)
d.raw <- bind_rows(d.raw, id)
}
As reported, I removed all ips that were duplicated, participants who didn’t pass the attention check and any first offers that were 3 SD above the mean, just as the authors did.
d1 = d.raw[d.raw$ip=="",] #dataset without ip addresses
d2 = d.raw[d.raw$ip!="",] #dataset with ip addresses
d = d2[!(duplicated(d2$ip) | duplicated(d2$ip, fromLast = TRUE)), ] #first removing ALL duplicates among ip addresses that wer e not blank
d = rbind(d1, d)
Here is the proportion of people who will remain in the study for analysis after removing attention check and 3SD:
length(d$power)/ length(d.raw$power)
## [1] 1
And here is how many people dropped:
length(d.raw$power) - length(d$power)
## [1] 0
Now I will aggregate the four power measures to get the perceived power DV
d = d %>% filter (att_check =="pass" & first_offer < (3*sd(first_offer))) %>%
mutate (agg_power = (power + control + strong + confident)/4)
Let’s look at some histograms first!
qplot(d$power)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
qplot(d$first_offer)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The alpha of the original paper was .92.
#calculating alpha
power = cbind(d$power, d$control, d$strong, d$confident)
ICC(power, missing = F)[[1]]$ICC[6] #alphas are a subclass of interclass correlation coefficients, and this one captures cronbach's alpha
## [1] 0.9469794
The alpha of this replication for perceived power was 0.95.
Here is the original graph:
Here are my plots:
d.mean = d %>%
group_by (cond) %>%
summarise (power= mean(agg_power),
first_offer1 = mean(first_offer),
sem_power = sem(agg_power),
sem_fo = sem(first_offer))
d.mean$cond1 = as.character(d.mean$cond)
d.mean$cond1[d.mean$cond1=="none"]= "No BATNA"
d.mean$cond1[d.mean$cond1=="weak"]= "Weak BATNA"
d.mean$cond1[d.mean$cond1=="strong"]= "Strong BATNA"
d.mean$cond1 <- factor(d.mean$cond1, levels = c("No BATNA","Weak BATNA","Strong BATNA"))
ggplot(d.mean, aes(x = cond1, y = first_offer1)) +
theme_classic()+
geom_bar(position = position_dodge(), stat = "identity") +
ylab("First Offer (U.S.$)") +
xlab("Condition") +
coord_cartesian(ylim=c(0, 12)) +
geom_errorbar(aes(ymin=first_offer1-sem_fo, ymax=first_offer1+sem_fo),
width=.2,
position=position_dodge(.9))
ggplot(d.mean, aes(x = cond1, y = power, group=1)) +
theme_classic()+
ylab("Perceived Power") +
xlab("Condition") +
coord_cartesian(ylim=c(4.5, 7)) +
geom_line(position = position_dodge(), stat = "identity") +
geom_point()+
geom_errorbar(aes(ymin=power-sem_power, ymax=power+sem_power),
width=.2,
position=position_dodge(.9))
The original paper compared the first offers of the weak BATNA to both the strong BATNA and no BATNA and found that the weak BATNA first offer (M = $4.57, SD = 1.74, 95% CI = [4.33, 4.92]) was significantly lower than the strong (M = $11.02, SD = 1.90, 95% CI = [10.63, 11.42]) and no BATNA condition. In addition, they found that the no BATNA first offer (M = $7.61, SD = 5.33, 95% CI = [6.54, 8.68]) was lower than the strong BATNA first offer using dummy coded regression and reporting t-statistics for the dummy coded regression coefficients (all t’s higher than 6.18).
In addition, the original paper found that the weak BATNA (M = 5.25, SD = 1.09, 95% CI = [5.03, 5.47]) reported feeling less powerful than the strong BATNA (M = 5.75, SD = 0.96, 95% CI = [5.55, 5.95]) and more powerful than the no BATNA condition (M = 4.80, SD = 1.03, 95% CI = [4.59, 5.01]) and that (obviously) the strong BATNA felt more powerful than the no BATNA condition.
I replicate these analyses below using dummy coding and regression. Importantly, the key test statistic that I care about is the comparison between weak BATNA and no BATNA (the second beta in the first regression). Showing that the weak BATNA is lower for the first offer than the strong BATNA.
d$cond = as.factor(d$cond)
contrasts(d$cond)
## strong weak
## none 0 0
## strong 1 0
## weak 0 1
contrasts(d$cond) = cbind(c(0,1,0),c(1,0,0)) # compare each against weak BATNA condiion
colnames(attr(d$cond, "contrasts")) <- c("Strong_vs_Weak", "None_vs_Weak")
knitr::kable(summary(lm(first_offer ~ cond, data= d))$coef)
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 4.788965 | 0.3428778 | 13.966973 | 0.0e+00 |
| condStrong_vs_Weak | 5.603892 | 0.5066794 | 11.060034 | 0.0e+00 |
| condNone_vs_Weak | 2.794368 | 0.5540540 | 5.043494 | 1.4e-06 |
contrasts(d$cond) = cbind(c(0,1,0),c(0,0,1)) # compare no vs. strong
colnames(attr(d$cond, "contrasts")) <- c("None_vs_Strong", "None_vs_Weak")
knitr::kable(summary(lm(first_offer ~ cond, data= d))$coef) #first beta
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 7.583333 | 0.4352133 | 17.424406 | 0.0e+00 |
| condNone_vs_Strong | 2.809524 | 0.5732098 | 4.901389 | 2.6e-06 |
| condNone_vs_Weak | -2.794368 | 0.5540540 | -5.043494 | 1.4e-06 |
contrasts(d$cond) = cbind(c(0,1,0),c(1,0,0)) # compare each against weak BATNA condiion
colnames(attr(d$cond, "contrasts")) <- c("Strong_vs_Weak", "None_vs_Weak")
knitr::kable(summary(lm(agg_power ~ cond, data= d))$coef)
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 5.4525862 | 0.1569836 | 34.733471 | 0.0000000 |
| condStrong_vs_Weak | 0.2821077 | 0.2319787 | 1.216093 | 0.2259961 |
| condNone_vs_Weak | -0.5359195 | 0.2536688 | -2.112674 | 0.0364036 |
contrasts(d$cond) = cbind(c(0,1,0),c(0,0,1)) # compare no vs. strong
colnames(attr(d$cond, "contrasts")) <- c("None_vs_Strong", "None_vs_Weak")
knitr::kable(summary(lm(agg_power ~ cond, data= d))$coef) #first beta
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 4.9166667 | 0.1992586 | 24.674801 | 0.0000000 |
| condNone_vs_Strong | 0.8180272 | 0.2624391 | 3.117017 | 0.0022175 |
| condNone_vs_Weak | 0.5359195 | 0.2536688 | 2.112674 | 0.0364036 |
The original paper also calculated effect sizes for all comparisons between conditions. Below I calculate effect sizes for each comparison (strong vs. weak, weak vs. none, and none vs. strong) for both first offer and perceived power.
We will compare our effect sizes to the effect sizes in the original paper, which were higher than .76 for the first offer and higher than .42 for the power DV. Below are our calculated effect sizes.
#effect size split into two conditions
d.ws = d %>% filter(d$cond =="weak" | d$cond=="strong"); d.ws$cond = droplevels(d.ws$cond)
d.wn = d %>% filter(d$cond =="weak" | d$cond=="none"); d.wn$cond = droplevels(d.wn$cond)
d.sn = d %>% filter(d$cond =="none" | d$cond=="strong"); d.sn$cond = droplevels(d.sn$cond)
#calcualte d for first offer
cohen.d(d.ws$first_offer, d.ws$cond)
##
## Cohen's d
##
## d estimate: 2.47136 (large)
## 95 percent confidence interval:
## inf sup
## 1.954287 2.988432
cohen.d(d.wn$first_offer, d.wn$cond)
##
## Cohen's d
##
## d estimate: 0.9655053 (large)
## 95 percent confidence interval:
## inf sup
## 0.5162182 1.4147924
cohen.d(d.sn$first_offer, d.sn$cond)
##
## Cohen's d
##
## d estimate: -1.046042 (large)
## 95 percent confidence interval:
## inf sup
## -1.5171249 -0.5749585
#calcualte d for power
cohen.d(d.ws$agg_power, d.ws$cond)
##
## Cohen's d
##
## d estimate: 0.23155 (small)
## 95 percent confidence interval:
## inf sup
## -0.1581468 0.6212469
cohen.d(d.wn$agg_power, d.wn$cond)
##
## Cohen's d
##
## d estimate: -0.4320437 (small)
## 95 percent confidence interval:
## inf sup
## -0.862774268 -0.001313097
cohen.d(d.sn$agg_power, d.sn$cond)
##
## Cohen's d
##
## d estimate: -0.7350351 (medium)
## 95 percent confidence interval:
## inf sup
## -1.1915443 -0.2785259
In addition, the original paper calculated CIs for each mean of each condition for both power and first offers. I replicate below.
ci.95 = function (mean, sd, n) {
error = qt(0.975,df=n-1)*sd/sqrt(n)
left = mean - error
right = mean + error
print(left)
print(right)
}
d.s = d %>% filter(d$cond=="strong")
d.n = d %>% filter(d$cond=="none")
d.w = d %>% filter(d$cond=="weak")
#strong BATNA CI
ci.95(mean(d.s$first_offer), sd(d.s$first_offer), length(d.s$first_offer))
## [1] 9.830485
## [1] 10.95523
#weak BATNA CI
ci.95(mean(d.w$first_offer), sd(d.w$first_offer), length(d.w$first_offer))
## [1] 4.131966
## [1] 5.445965
#no BATNA CI
ci.95(mean(d.n$first_offer), sd(d.n$first_offer), length(d.n$first_offer))
## [1] 6.418599
## [1] 8.748068
#strong BATNA CI
ci.95(mean(d.s$agg_power), sd(d.s$agg_power), length(d.s$agg_power))
## [1] 5.417458
## [1] 6.051929
#weak BATNA CI
ci.95(mean(d.w$agg_power), sd(d.w$agg_power), length(d.w$agg_power))
## [1] 5.109039
## [1] 5.796133
#no BATNA CI
ci.95(mean(d.n$agg_power), sd(d.n$agg_power), length(d.n$agg_power))
## [1] 4.536223
## [1] 5.29711
Replication Results
We completely replicate the effect of BATNA condition on first offers. We find that the weak BATNA first offer (M = $4.79, SD = 2.50, 95% CI = [4.13, 5.45]) was significantly lower than the strong (M = $10.39, SD = 1.96, 95% CI = [9.83, 10.96]) and no BATNA condition. In addition, they found that the no BATNA first offer (M = $7.58, SD = 3.44, 95% CI = [6.42, 8.75]) was lower than the strong BATNA first offer using dummy coded regression and reporting t-statistics for the dummy coded regression coefficients (all t(140)’s higher than 4.90, ps<.01).
In addition, we found that the weak BATNA (M = 5.45, SD = 1.31, 95% CI = [5.11, 5.80]) reported feeling less powerful than the strong BATNA (M = 5.73, SD = 1.10, 95% CI = [5.42, 6.05]), but this was not a significant difference. We also found that the weak BATNA reported feeling more powerful than the no BATNA condition (M = 4.92, SD = 1.12, 95% CI = [4.54, 5.30]) and that (obviously) the strong BATNA felt more powerful than the no BATNA condition, (both t(140)’s greater than 2, ps<.05).
However, orthogonal contrasts make sense as the analysis strategy, so I will do those here as a supplemental analysis:
contrasts(d$cond)
## None_vs_Strong None_vs_Weak
## none 0 0
## strong 1 0
## weak 0 1
contrasts(d$cond) = cbind(c(1,1,-2),c(-1,1,0)) #comparing weak to other two, and then strong vs. no
colnames(attr(d$cond, "contrasts")) <- c("None&Strong_vs_Weak", "None_vs_Strong")
knitr::kable(summary(lm(first_offer ~ cond, data= d))$coef)
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 7.588385 | 0.2226444 | 34.082987 | 0.0e+00 |
| condNone&Strong_vs_Weak | 1.399710 | 0.1489622 | 9.396411 | 0.0e+00 |
| condNone_vs_Strong | 1.404762 | 0.2866049 | 4.901389 | 2.6e-06 |
contrasts(d$cond) = cbind(c(-2,1,1),c(0,1,-1)) # comparing no batna to other two, than weak vs. strong
colnames(attr(d$cond, "contrasts")) <- c("Weak&Strong_vs_None", "Weak_vs_Strong")
knitr::kable(summary(lm(agg_power ~ cond, data= d))$coef)
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 5.3679823 | 0.1019358 | 52.660431 | 0.0000000 |
| condWeak&Strong_vs_None | 0.2256578 | 0.0768531 | 2.936224 | 0.0038852 |
| condWeak_vs_Strong | 0.1410538 | 0.1159894 | 1.216093 | 0.2259961 |
These reveal the same basic results as the planned contrasts for the first offer DV and the perceived power DV.
Given the beautiful replication of this experiment, there are really no new analyses that need to be done. However, we will explore a couple other interesting effects below:
#does gender predict power?
knitr::kable(summary(lm(agg_power ~ sex, data=d))$coef)
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 5.3166667 | 0.1587302 | 33.4949937 | 0.0000000 |
| sexm | 0.1682731 | 0.2083476 | 0.8076555 | 0.4206493 |
# does age predict power?
d$age = as.numeric(d$age)
knitr::kable(summary(lm(agg_power ~ age, data=d))$coef)
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 5.7214992 | 0.3778207 | 15.1434258 | 0.0000000 |
| age | -0.0092746 | 0.0109777 | -0.8448589 | 0.3996209 |
#does gender predict first offer?
knitr::kable(summary(lm(first_offer ~ sex, data=d))$coef)
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 7.1335000 | 0.4589636 | 15.5426276 | 0.0000000 |
| sexm | 0.4809578 | 0.6024308 | 0.7983619 | 0.4260031 |
These were just fun analyses. Nothing new to report on here.
In this replication report, the key test statistic that we aimed to replicate was the comparison between the weak BATNA and no BATNA conditions for the first offer. This is a key comparison because although having a weak BATNA should make one feel more powerful than no BATNA, if the BATNA anchors the first offer, we should see that a weak BATNA produces a lower first offer than no BATNA. Indeed, we replicate this key finding.
In addition, we also replicated all other comparisons for the first offer.
For the power DV, which was a secondary DV for the purposes of this replication, we do not perfectly replicate the significant differences, but we see the same pattern. We were underpowered to observe the effect sizes that the original paper reported, so it is not surprising that we failed to reach significance. Importantly, we observe that most important comparisons are significant (the no BATNA condition feels the least powerful of the three conditions (both ps <.05)).
Overall, I believe this replication attempt is a win for science! Importantly, we matched the environment and materials almost identically to the original paper. This is thanks to the openness of the authors in sharing the materials and the fact that mTurk allows researchers from around the world to access the same population.
In addition, although we had limited resources and couldn’t reach a power of .80 for every effect in Experiment 1a, in one academic quarter we were able to capture the effects of the original paper. I would feel confident doing work that was based on this finding.
Importantly, this was a direct replication. Thus, we can only speak to the validity of the findings of Schaerer et al. in terms of the exact materials used. Future replications should look into conceptual replications (and perhaps mixed models to sample different negotiation situations).
Lastly, given that the standard deviation is so much higher in the no BATNA condition, it suggests that an interesting moderator might be lurking. It would be fun for future research to investigate this possibility.
Overall: Mission accomplished!