‘Why Do We Hate Hypocrites? Evidence for a Theory of False Signaling’ by Jordan et al. explored the possible reasons why hypocrisy is treated with notably high disapproval. The authors published the research along with the data and materials made publically accessible on Open Science Framework which makes the study a strong candidate for replicability.
Through the 5 studies, Jordan et al. suggest that condemnation acts as a strong signal for virtuous character, inviting greater disdain upon transgression than verbal statements about own moral character. Their results also supported the theory that the response to hypocrisy is a response to feeling manipulated as demonstrated in their comparison to hypocrisy to ‘honest’ hypocrisy (transgression and condemnation but also with confession) where the disapproval disappeared. In this replication study, I replicate the third study by Jordan et al.
Original study: https://journals.sagepub.com/doi/abs/10.1177/0956797616685771
Replication repository: https://github.com/andrewnam/psych251proj
Qualtrics survey: https://stanfordgsb.qualtrics.com/jfe/form/SV_8iFKuwcYXStmrxH
A sample size of 451 participants yielded the original study an effect size of f = 0.3443 (d = 0.6887). To achieve 80% power for detecting the effect size at \(\alpha = .05\), the study would require a sample size of 85 participants. 110 participants would yield a 90% power and 134 participants for a 95% power. Pairwise-comparison between liars and hypocrites yielded effect size of t = -2.26 (d = -.26) and between liars and controls t = -4.96 (d = -.57). To achieve 80%, 90%, and 95% power, 459, 614, and 758 participants would be required for the hypocrite-liar comparison and 99, 132, and 162 for the liar-control comparison.
In this replication study, I sampled across 338 participants, sufficient for over 80% power for the composite and the liar-control pair comparison analyses but underpowered for the hypocrite-liar pair comparison analysis. The study only analyzed subjects who had unique IP addresses and evaluated all target characters.
Vignettes and response scales were followed exactly in accordance with the original study.
Subjects evaluted the following four scenarios in a random order:
Becky and her friend Amanda are discussing a mutual acquaintance. Amanda mentions that the acquaintance often downloads music illegally from the Internet.
Hypocrisy condition: Becky says that she thinks it is morally wrong to download music illegally from the Internet.
Liar condition: Becky says that she doesn’t download music illegally from the Internet.
Control: No sentence presented in the control transgressor condition.
Shortly after their conversation, Becky goes online, and downloads music illegally.
Jennifer and her friend Rose are discussing a mutual acquaintance. Rose mentions that the acquaintance recently tried to get out of jury duty.
Hypocrisy condition: Jennifer says that she thinks it is morally wrong to try to get out of jury duty.
Liar condition: Jennifer says that she doesn’t try to get out of jury duty.
Control: No sentence presented in the control transgressor condition
Shortly after their conversation, Jennifer gets called for jury duty, and tries to get out of it.
Bruce and his friend are Zach are discussing a mutual acquaintance. Zach mentions that the acquaintance often ignores his mother’s phone calls.
Hypocrisy condition: Bruce says that he thinks it is morally wrong to ignore your mother’s phone calls.
Liar condition: Bruce says that he doesn’t ignore his mother’s phone calls.
Control: No sentence presented in the control transgressor condition.
Shortly after their conversation, Bruce notices that his mother is calling, and ignores the call.
Kevin and his friend Jack are discussing a mutual acquaintance. Jack mentions that the acquaintance often uses a lot of paper by printing documents single-sided.
Hypocrisy condition: Kevin says that he thinks it is morally wrong to use a lot of paper by printing documents single-sided.
Liar condition: Kevin says that he doesn’t use a lot of paper by printing documents single-sided.
No sentence presented in the control transgressor condition.
Shortly after their conversation, Kevin has a large document to print, and uses a lot of paper by printing it single-sided.
The participants were presented a sliding scale with no numerical values.
How trustworthy do you think [Person of Interest] is? - Scale from “Not at all trustworthy” to “Very trustworthy”
How hypocritical do you think [Person of Interest] is? - Scale from “Not at all hypocritical” to “Very hypocritical”
How much do you like [Person of Interest]? - Scale from “Not at all” to “Very much”
How honest do you think [Person of Interest] is? - Scale from “Not at all honest” to “Very honest”
How good a person do you think Becky is? - Scale from “Not at all good” to “Very good”
The procedure was followed exactly in accordance with the original study. Participants were presented vignettes in which a target character discusses an acquaintance’s moral transgression and then privately goes on to engage in the same transgression. We manipulated whether, in addition to committing the transgression, the target was a hypocrite, a (direct) liar, or neither. We again presented subjects with vignettes and asked them to evaluate the target characters in the vignettes. Each vignette described a conversation between a target character and a friend. In all conditions, this conversation began with the target and the friend discussing a mutual acquaintance. In this discussion, the friend mentioned that the mutual acquaintance often engaged in a particular moral transgression. In the hypocrisy condition, the target responded to the friend by condemning the transgression. In contrast, in the liar condition, the target responded by directly stating that he or she did not engage in the relevant transgression. Finally, in the control-transgressor condition, we did not include any information about a response from the target. Shortly after this conversation ended, in all conditions, the target went on to commit the relevant violation. After reading each vignette, subjects rated the target across 5 questions based on their valence towards the target such as how good a person the target was with a sliding scale from “reading not at all [trait]” to “very [trait]”.
The analysis was followed exactly in according to the original study.
To test our prediction, we conducted a one-way ANOVA investigating the effect of condition on positive evaluations of the targets across the vignettes.
More specifically, the scale ratings were averaged for each participant across each question and vignette as a composite measure for evaluation of targets. Then a one-way ANOVA was performed regressing the condition (hypocrite, liar, or control transgressor) to the evaluation of targets measure.
The original authors also collected demographics data regarding age, gender, education, income, generaltrust, reason for their choices, and previous experience in similar studies. These were excluded in the replication. These variables, recorded in the supplementary materials and open-sourced dataset, record minimum details for implementation (e.g. it’s unclear whether these questions were presented before or after the vignette-response sections). Moreover, the authors report no significant interactions between the conditions and demographics data.
Data preparation following the analysis plan.
####Load Relevant Libraries and Functions
library("readxl")
library(qualtRics)
library(Rmisc)
## Loading required package: lattice
## Loading required package: plyr
library(broom)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0 ✔ purrr 0.2.5
## ✔ tibble 1.4.2 ✔ dplyr 0.7.7
## ✔ tidyr 0.8.2 ✔ stringr 1.3.1
## ✔ readr 1.1.1 ✔ forcats 0.3.0
## ── Conflicts ────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::arrange() masks plyr::arrange()
## ✖ purrr::compact() masks plyr::compact()
## ✖ dplyr::count() masks plyr::count()
## ✖ dplyr::failwith() masks plyr::failwith()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::id() masks plyr::id()
## ✖ dplyr::lag() masks stats::lag()
## ✖ dplyr::mutate() masks plyr::mutate()
## ✖ dplyr::rename() masks plyr::rename()
## ✖ dplyr::summarise() masks plyr::summarise()
## ✖ dplyr::summarize() masks plyr::summarize()
library(effsize)
library(sjstats)
##
## Attaching package: 'sjstats'
## The following object is masked from 'package:broom':
##
## bootstrap
####Import data
orig_data <- read_excel("../original_paper/original_data.xls", sheet=3)
filename <- "../filtered_qualtrics_data_20181205_1910.csv"
raw_data <- readSurvey(filename)
###Data Preparation
data <- raw_data %>%
mutate(subject = 1:nrow(raw_data)) %>%
select(
# Selecting responses to only relevant questions
subject, starts_with("Q"), condition, - c(Q5_1, Q65_1, Q71_1, Q77_1)
) %>%
drop_na()
summary_data <- data %>%
gather("question", "response", -c(subject, condition)) %>%
dplyr::group_by(subject, condition) %>%
dplyr::summarise(overall = mean(response))
### define cohen's d
cohen_d = function(desc_stats, cond1, cond2) {
m1 = (desc_stats %>% filter(condition == cond1))$mean
m2 = (desc_stats %>% filter(condition == cond2))$mean
sd1 = (desc_stats %>% filter(condition == cond1))$sd
sd2 = (desc_stats %>% filter(condition == cond2))$sd
sd_pooled = sqrt((sd1^2 + sd2^2)/2)
(m1 - m2)/sd_pooled
}
The analyses as specified in the analysis plan.
Composite analysis
desc_stats <- summary_data %>%
gather("question", "response", -c(subject, condition)) %>%
dplyr::group_by(condition) %>%
dplyr::summarise(mean = mean(response, na.rm=TRUE), sd = sd(response, na.rm=TRUE), n = n())
desc_stats
## # A tibble: 3 x 4
## condition mean sd n
## <chr> <dbl> <dbl> <int>
## 1 control 46.7 16.0 96
## 2 hypocrite 34.6 20.6 135
## 3 liar 36.1 15.8 107
fit = lm(overall ~ condition, data=summary_data)
anova(fit)
## Analysis of Variance Table
##
## Response: overall
## Df Sum Sq Mean Sq F value Pr(>F)
## condition 2 9034 4517.1 14.071 1.356e-06 ***
## Residuals 335 107543 321.0
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
print(sprintf("Eta squared: %f", eta_sq(fit)$etasq))
## [1] "Eta squared: 0.077495"
Pairwise comparison between hypocrites and liars.
t.test(overall ~ condition, data=summary_data %>% filter(condition != "control"), conf.level=.95)
##
## Welch Two Sample t-test
##
## data: overall by condition
## t = -0.64255, df = 239.81, p-value = 0.5211
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -6.112711 3.105805
## sample estimates:
## mean in group hypocrite mean in group liar
## 34.60694 36.11040
fit = lm(overall ~ condition, data=summary_data %>% filter(condition != "control"))
anova(fit)
## Analysis of Variance Table
##
## Response: overall
## Df Sum Sq Mean Sq F value Pr(>F)
## condition 1 135 134.92 0.389 0.5334
## Residuals 240 83233 346.80
print(sprintf("Cohen's d: %f", cohen_d(desc_stats, 'hypocrite', 'liar')))
## [1] "Cohen's d: -0.081928"
Pairwise comparison between liars and control-transgressors.
t.test(overall ~ condition, data=summary_data %>% filter(condition != "hypocrite"), conf.level=.95)
##
## Welch Two Sample t-test
##
## data: overall by condition
## t = 4.7098, df = 198.17, p-value = 4.656e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 6.126863 14.953123
## sample estimates:
## mean in group control mean in group liar
## 46.65039 36.11040
fit = lm(overall ~ condition, data=summary_data %>% filter(condition != "hypocrite"))
anova(fit)
## Analysis of Variance Table
##
## Response: overall
## Df Sum Sq Mean Sq F value Pr(>F)
## condition 1 5621 5621.3 22.207 4.564e-06 ***
## Residuals 201 50879 253.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
print(sprintf("Cohen's d: %f", cohen_d(desc_stats, 'liar', 'control')))
## [1] "Cohen's d: -0.662285"
Graph of original data.
orig_data %>%
mutate(condition = revalue(condition, c("controltrans" = "control", "tradhyp" = "hypocrite"))) %>%
ggplot() +
aes(x=condition, y=overall) +
stat_summary(fun.y="mean", geom="bar") +
stat_summary(fun.data="mean_cl_boot", geom="errorbar", width=0.15) +
labs(y = "rating", title = "Original Results") +
theme(plot.title = element_text(hjust = 0.5))
Graph of replication data.
summary_data %>%
ggplot() +
aes(x=condition, y=overall) +
stat_summary(fun.y="mean", geom="bar") +
stat_summary(fun.data="mean_cl_boot", geom="errorbar", width=0.15) +
labs(y = "rating", title = "Replication Results") +
theme(plot.title = element_text(hjust = 0.5))
In the main composite analysis, the original paper found a significant effect of condition, \(F(2, 448) = 26.48, p < .001, \eta_p^2 = .106\). The original pairwise comparisons demonstrated found that hypocrites (M = 32.07, SD = 15.93) were evaluated as worse than liars (M = 36.15, SD = 15.37), mean difference = −4.08, 95% CI = [−7.63, −0.53], t(299) = −2.26, p = .025, d = −0.26, who were evaluated as worse than control transgressors (M = 44.77, SD = 14.89), mean difference = −8.62, 95% CI = [−12.04, −5.20], t(301) = −4.96, p < .001, d = −0.57.
In this replication study, the composite analysis found a significant effect of condition, \(F(2, 335) = 14.071, p < .001, \eta_p^2 = .077\). The replication pairwise comparisons demonstrated found that hypocrites (M = 34.61, SD = 20.56) were evaluated as worse than liars (M = 36.11, SD = 15.83), mean difference = -1.50, 95% CI = [−6.11, 3.11], t(234) = −0.64, p = .5211, d = −0.08, who were evaluated as worse than control transgressors (M = 44.77, SD = 14.89), mean difference = -10.53, 95% CI = [-14.95, −6.13], t(198) = −4.71, p < .001, d = −0.66.
While composite ANOVA and the liar-control pairwise analysis were found to be statistically significant, the hypocrite-liar comparison was not. This is likely due to the fact that the replication study had sufficient samples to identify the composite and liar-control effects with 80% power but not for the much smaller effect of hypocrite-liar comparison.
The study replicated as expected given the sample size of the replication analysis and whether the unreplicated effect will also replicate remains to be confirmed. Due to the lack of attention checks in both the original and replication studies, the data is inherently very noisy with many participants providing nonsense data to rush through the experiment. Had there been an attention check, an appropriate exclusion criterion, and consequently a higher signal-to-noise ratio in the data, the studies may have revealed a much higher effect size and required a smaller sample size to detect at sufficient power. For example, intuitively, a “medium effect size” for a dislike towards liars over non-liars should be an understatement.