Many social psychology studies are about interaction effects, and many of these have had unsuccesful replications. Interaction effects are found to be especially prone to fail replication in the replication project, and in this study.
In this document I explore whether interaction effects are less likely to replicate as a general rule, in a real, large data set. The data set is from the users’ answers to a large number of personal questions in the OKCupid data set.
We find that significant interaction effects explain very little of the variance, and that compared to significant main effects they are less significant, and are much less likely to replicate in a larger sample.
For the third variable we include either another ordinal variable, or gender. We find that interaction effects only replicate at 0.05 in ~9% of cases for ordinal variables, and ~22% for gender. Whereas main effects replicate in >95% of cases for either.
If we look at gender-specific effects (effects that are only significant for one gender), the replication mostly gives a non-gender-specific effect, and is very unlikely to find an effect only in that gender.
I generate 1,000,000 random regressions using the OKCupid data. An example of a regression would be:
y = “Do you like to listen to audiobooks?” a = “How was your childhood?”
b = gender
y = a + b + a * b
I look at main effects(a, b) and interaction effects (a * b).
I do this in a sample with the answers of 500 people. Then I look at the same regression in a larger sample of 5,000 people, and see whether the results that are significant in the n = 500 sample, are also significant in the larger sample.
For instance, one result could be that people who like audiobooks tend to have had a good childhood. This type of main effect result typically is replicated in the n = 5,000 sample.
Or another result could be that there is a significant interaction between gender and childhood quality in predicting whether the person likes audiobooks. This type of effect is typically not replicated in the n = 5,000 sample.
library(pacman)
p_load(tidyverse, rms, broom, magrittr, glue, purrr, pander, ggpubr)
options(contrasts=c("contr.treatment", "contr.treatment"))
PanderOpts(digits = 3, table.split.table = Inf, missing = "-", table.alignment.default = "left")
theme_set(theme_bw(base_size=13))
set.seed(123)
answers = read_rds("data/parsed_data.rds") %>% filter(gender != "Other")
questions = read_csv2("data/question_data.csv")
small_sample_n <- 250
large_sample_n <- 2500
Gender <- c("Male", "Female")
both_sample_n <- small_sample_n + large_sample_n
ordinal_vars <- questions %>% filter(Type == "O", N >= both_sample_n*2) %>% pull(Q)
tribble(~Gender, ~original, ~replication,
"Male", small_sample_n, large_sample_n,
"Female", small_sample_n, large_sample_n,
"Combined", small_sample_n*2, large_sample_n*2) %>%
pander(caption="Sample sizes")| Gender | original | replication |
|---|---|---|
| Male | 250 | 2500 |
| Female | 250 | 2500 |
| Combined | 500 | 5000 |
tribble(~"Number of questions with sufficiently high sample size", ordinal_vars %>% length)| Number of questions with sufficiently high sample size |
|---|
| 1622 |
analyze <- function(vars, effect, df){
v1 <- vars[1]; v2 <- vars[2]; v3 <- vars[3]
model <- if (effect == "interaction") glue("{v1} ~ {v2} * {v3}")
else glue("{v1} ~ {v2} + {v3}")
m <- rms::lrm(model %>% as.formula, data = df, penalty = .01, tol = 1e-22)
R2 <- m$stats[["R2"]]
rowname <- if (effect == "interaction") " All Interactions"
else "TOTAL"
P <- m %>% anova %>% tidy %>% filter(.rownames == rowname) %>% slice(1) %>% pull(P)
list(R2 = R2, P = P)
}
add_column <- function(g, samplesize, effect, vars, third_variable, df_female, df_male){
if (g != "Combined" & third_variable == "gender"){return}
df <- if (g == "Combined") df
else df %>% filter(gender == g)
rows <- if (samplesize == "small") seq(1,small_sample_n)
else seq(small_sample_n + 1, both_sample_n)
df <- bind_rows(
df_male %>% filter(row_number() %in% rows),
df_female %>% filter(row_number() %in% rows))
tryCatch({
res <- analyze(vars, effect, df)
tibble(!!glue("{g}_{samplesize}_{effect}_R2") := res$R2,
!!glue("{g}_{samplesize}_{effect}_P") := res$P)
}, error = function(e) NULL
)
}
add_row <- function(third_variable, df){
vars <- if (third_variable == "gender") c(sample(ordinal_vars, 2), "gender")
else sample(ordinal_vars, 3)
dfu <- df %>% select(vars, gender) %>% na.omit
df_female <- dfu %>% filter(gender == "Woman")
df_male <- dfu %>% filter(gender == "Man")
if (df_female %>% nrow > both_sample_n & df_male %>% nrow > both_sample_n){
crossing(g = c("Combined", "Man", "Woman"),
samplesize = c("large", "small"),
effect = c("interaction", "main")) %>%
pmap(add_column, vars, third_variable, df_female, df_male) %>%
bind_cols %>% mutate(Third_variable=third_variable)
}
}
add_trio <- function(df) pmap(list(c("ordinal", "gender")), add_row, df) %>% bind_rows
#rerun(1000000, add_trio(answers)) %>% bind_rows %>% write_tsv(glue("trios.tsv"))
df <- read_tsv("data/trios.tsv")
questions %>% filter(Q %in% sample(ordinal_vars, 5)) %>%
select(text, option_1, option_2, option_3, option_4)| text | option_1 | option_2 | option_3 | option_4 |
|---|---|---|---|---|
| Do you believe that because a person is overweight that it is because they are weak and without self control? | Yes / Usually | No | I’m Not Sure | - |
| How was your childhood? | Wonderful! | It was okay. | Not great, but I’m no worse for the wear. | Awful, and I have emotional issues as a result. |
| Do you enjoy creating mathematical problems to solve by yourself? | Yes. | No. | - | - |
| How many pillows do you sleep with at night? | None. | One. | Two. | Three or more. |
| Do you like to listen to audiobooks? | Yes. | No. | I’ve never tried them. | - |
prop <- df %>% group_by(Third_variable) %>%
summarise(
Interactions = sum(
Combined_small_interaction_P < 0.05 & Combined_large_interaction_P < 0.05, na.rm=T) /
sum(Combined_small_interaction_P < 0.05, na.rm=T),
Main = sum(
Combined_small_main_P < 0.05 & Combined_large_main_P < 0.05, na.rm=T) /
sum(Combined_small_main_P < 0.05, na.rm=T))
prop %>% gather(Effect, value, -Third_variable) %>%
ggplot(aes(x = Effect, y = value, fill=Third_variable)) +
geom_bar(stat="identity", position="dodge", width=0.5, col="black") +
ylab("Proportion replicated") +
scale_fill_manual(values=jcol)
| Third_variable | Interactions | Main |
|---|---|---|
| gender | 0.228 | 0.978 |
| ordinal | 0.0881 | 0.968 |
get_gender <- function(g1, df){
g2 <- if (g1 == "Man") "Woman" else "Man"
df %>%
filter(
Combined_small_main_P > 0.05,
!!sym(glue("{g1}_small_main_P")) < 0.05,
!!sym(glue("{g2}_small_main_P")) > 0.05) %>%
summarise(
gender = g1,
only_gender = n(),
repl_combined = sum(Combined_large_main_P < 0.05, na.rm=T),
repl_females = sum(Woman_large_main_P < 0.05, na.rm=T),
repl_males = sum(Man_large_main_P < 0.05, na.rm=T),
repl_males_only = sum(Man_large_main_P < 0.05 & Woman_large_main_P > 0.05 &
Combined_large_main_P > 0.05, na.rm=T),
repl_females_only = sum(Woman_large_main_P < 0.05 & Man_large_main_P > 0.05 &
Combined_large_main_P > 0.05, na.rm=T))
}
dfg <- pmap(list(c("Man", "Woman")), get_gender, df) %>% bind_rows
ymax_male <- dfg[1,2] %>% pull()
ymax_female <- dfg[2,2] %>% pull()
t3 <- dfg %>%
mutate(Combined = repl_combined / only_gender,
Females = repl_females / only_gender,
Males = repl_males / only_gender,
Females_only = repl_females_only / only_gender,
Males_only = repl_males_only / only_gender) %>%
select(Combined, Females, Males, Females_only, Males_only, gender) %>%
gather(var, value, -gender) %>%
spread(gender, value)
p1 <- t3 %>%
ggplot(aes(var, Woman, fill=var)) +
geom_bar(stat="identity", fill="turquoise4", color="black") +
ylab("Proportion") + xlab("Replicated in") +
ggtitle("Effects significant only in females") +
theme(axis.text.x = element_text(angle = 45, hjust=1),
plot.title = element_text(size = 13)) +
ylim(0, 1)
p2 <- t3 %>%
ggplot(aes(var, Man)) +
geom_bar(stat="identity", fill="turquoise4", color="black") +
ylab("Proportion") + xlab("Replicated in") +
ggtitle("Effects significant only in males") +
theme(axis.text.x = element_text(angle = 45, hjust=1),
plot.title = element_text(size = 14)) +
ylim(0, 1)
ggarrange(p1, p2, ncol=2)| gender | only_gender | repl_combined | repl_females | repl_males | repl_males_only | repl_females_only |
|---|---|---|---|---|---|---|
| Man | 6200 | 5090 | 4210 | 4410 | 270 | 120 |
| Woman | 6880 | 5850 | 5290 | 4740 | 160 | 230 |
make_table <- function(df, size, effect){
base <- glue("Combined_{size}_{effect}")
df %>% group_by(Third_variable) %>%
mutate(total_n = n()) %>%
filter(!!sym(glue("{base}_P")) < 0.05) %>%
mutate(Proportion_significant = n() / total_n) %>%
summarise(
Proportion_significant = mean(Proportion_significant),
Mean_P_for_significant = mean(!!sym(glue("{base}_P"))),
Mean_R2_for_significant = ifelse(effect == "main",
mean(!!sym(glue("{base}_R2"))),
mean(!!sym(glue("{base}_R2")) - !!sym(glue("Combined_{size}_main_R2"))))) %>%
pander(caption = glue("{size} sample - {effect} effects"))
}
make_table(df, "small", "main")| Third_variable | Proportion_significant | Mean_P_for_significant | Mean_R2_for_significant |
|---|---|---|---|
| gender | 0.523 | 0.00768 | 0.064 |
| ordinal | 0.376 | 0.0108 | 0.0537 |
make_table(df, "large", "main")| Third_variable | Proportion_significant | Mean_P_for_significant | Mean_R2_for_significant |
|---|---|---|---|
| gender | 0.91 | 0.00149 | 0.0333 |
| ordinal | 0.855 | 0.00253 | 0.0199 |
make_table(df, "small", "interaction")| Third_variable | Proportion_significant | Mean_P_for_significant | Mean_R2_for_significant |
|---|---|---|---|
| gender | 0.0502 | 0.0242 | 0.0192 |
| ordinal | 0.0334 | 0.0255 | 0.0221 |
make_table(df, "large", "interaction")| Third_variable | Proportion_significant | Mean_P_for_significant | Mean_R2_for_significant |
|---|---|---|---|
| gender | 0.138 | 0.018 | 0.00261 |
| ordinal | 0.0752 | 0.0223 | 0.0028 |
plot_p <- function(df, var, title){
ggplot(df, aes(x = !!sym(var))) +
geom_freqpoly(binwidth=0.005, size=1) +
xlim(c(0.0, 0.05)) +
ggtitle(title) + xlab("P-value") +
theme(plot.title = element_text(size = 13))
}
plot_comparison <- function(df1, df2, title){
fig <- ggarrange(plot_p(df1, "Combined_small_interaction_P", "Interactions"),
plot_p(df2, "Combined_small_main_P", "Main effects"),
ncol=2, nrow=1, common.legend=T)
annotate_figure(fig, top = text_grob(title, size = 14))
}
plot_comparison(df, df, "All effects")plot_comparison(
df %>% filter(Combined_large_interaction_P < 0.05),
df %>% filter(Combined_large_main_P < 0.05),
"Replicated effects")plot_comparison(
df %>% filter(Combined_large_interaction_P > 0.05),
df %>% filter(Combined_large_main_P > 0.05),
"Non-replicated effects")I arbitrarily define “true” effects as those with a p value below 0.001 in the large sample. What is the power to detect those in the small sample?
power <- df %>% group_by(Third_variable) %>%
summarise(
Interactions = sum(
Combined_small_interaction_P < 0.05 & Combined_large_interaction_P < 0.001, na.rm=T) /
sum(Combined_large_interaction_P < 0.001, na.rm=T),
Main = sum(
Combined_small_main_P < 0.05 & Combined_large_main_P < 0.001, na.rm=T) /
sum(Combined_large_main_P < 0.001, na.rm=T))
power %>% gather(Effect, value, -Third_variable) %>%
ggplot(aes(x = Effect, y = value, fill=Third_variable)) +
geom_bar(stat="identity", position="dodge", width=0.5, col="black") +
ylab("Power to identify in small sample") +
scale_fill_manual(values=jcol) +
ylim(0, 1)What if you pick a random effect, and stick by it. This could be viewed as an assumption of no publication bias.
prop <- df %>% group_by(Third_variable) %>%
summarise(
Interactions_not_replicated = sum(
Combined_small_interaction_P < 0.05 & Combined_small_interaction_P > 0.01 & Combined_large_interaction_P > 0.05, na.rm=T) /
sum(Combined_small_interaction_P < 1.1, na.rm=T),
Interactions_replicated = sum(
Combined_small_interaction_P < 0.05 & Combined_small_interaction_P > 0.01 & Combined_large_interaction_P < 0.05, na.rm=T) /
sum(Combined_small_interaction_P < 1.1, na.rm=T),
Main_not_replicated = sum(
Combined_small_main_P < 0.05 & Combined_small_main_P > 0.01 & Combined_large_main_P > 0.05, na.rm=T) /
sum(Combined_small_main_P < 1.1, na.rm=T),
Main_replicated = sum(
Combined_small_main_P < 0.05 & Combined_small_main_P > 0.01 & Combined_large_main_P < 0.05, na.rm=T) /
sum(Combined_small_main_P < 1.1, na.rm=T))
prop %>% gather(Effect, value, -Third_variable) %>%
ggplot(aes(x = Effect, y = value, fill=Third_variable)) +
geom_bar(stat="identity", position="dodge", width=0.5, col="black") +
ylab("Proportion") +
scale_fill_manual(values=jcol)df %>% filter(Third_variable == "ordinal", Combined_small_interaction_P < 0.05) %>% drop_na() %>% nrow()## [1] 5960