1. Introduction

Many social psychology studies are about interaction effects, and many of these have had unsuccesful replications. Interaction effects are found to be especially prone to fail replication in the replication project, and in this study.

In this document I explore whether interaction effects are less likely to replicate as a general rule, in a real, large data set. The data set is from the users’ answers to a large number of personal questions in the OKCupid data set.

We find that significant interaction effects explain very little of the variance, and that compared to significant main effects they are less significant, and are much less likely to replicate in a larger sample.

For the third variable we include either another ordinal variable, or gender. We find that interaction effects only replicate at 0.05 in ~9% of cases for ordinal variables, and ~22% for gender. Whereas main effects replicate in >95% of cases for either.

If we look at gender-specific effects (effects that are only significant for one gender), the replication mostly gives a non-gender-specific effect, and is very unlikely to find an effect only in that gender.

 

2. Analysis

Explanation

I generate 1,000,000 random regressions using the OKCupid data. An example of a regression would be:

y = “Do you like to listen to audiobooks?” a = “How was your childhood?”
b = gender

y = a + b + a * b

I look at main effects(a, b) and interaction effects (a * b).

I do this in a sample with the answers of 500 people. Then I look at the same regression in a larger sample of 5,000 people, and see whether the results that are significant in the n = 500 sample, are also significant in the larger sample.

For instance, one result could be that people who like audiobooks tend to have had a good childhood. This type of main effect result typically is replicated in the n = 5,000 sample.

Or another result could be that there is a significant interaction between gender and childhood quality in predicting whether the person likes audiobooks. This type of effect is typically not replicated in the n = 5,000 sample.


Imports and settings

library(pacman)
p_load(tidyverse, rms, broom, magrittr, glue, purrr, pander, ggpubr)
options(contrasts=c("contr.treatment", "contr.treatment"))
PanderOpts(digits = 3, table.split.table = Inf, missing = "-", table.alignment.default = "left")
theme_set(theme_bw(base_size=13))
set.seed(123)

 

Read data

answers = read_rds("data/parsed_data.rds") %>% filter(gender != "Other")
questions = read_csv2("data/question_data.csv")

 

Create a smaller original sample and larger replication sample

small_sample_n <- 250
large_sample_n <- 2500
Gender <- c("Male", "Female")
both_sample_n <- small_sample_n + large_sample_n
ordinal_vars <- questions %>% filter(Type == "O", N >= both_sample_n*2) %>% pull(Q)

tribble(~Gender, ~original, ~replication,
        "Male", small_sample_n, large_sample_n,
        "Female", small_sample_n, large_sample_n,
        "Combined", small_sample_n*2, large_sample_n*2) %>% 
  pander(caption="Sample sizes")
Sample sizes
Gender original replication
Male 250 2500
Female 250 2500
Combined 500 5000


tribble(~"Number of questions with sufficiently high sample size", ordinal_vars %>% length)
Number of questions with sufficiently high sample size
1622

 

Run analysis

analyze <- function(vars, effect, df){
  v1 <- vars[1]; v2 <- vars[2]; v3 <- vars[3]
  model <- if (effect == "interaction") glue("{v1} ~ {v2} * {v3}") 
           else glue("{v1} ~ {v2} + {v3}")
  m <- rms::lrm(model %>% as.formula, data = df, penalty = .01, tol = 1e-22)
  R2 <- m$stats[["R2"]]
  rowname <- if (effect == "interaction") " All Interactions" 
             else "TOTAL"
  P <- m %>% anova %>% tidy %>% filter(.rownames == rowname) %>% slice(1) %>% pull(P)
  
  list(R2 = R2, P = P)
}


add_column <- function(g, samplesize, effect, vars, third_variable, df_female, df_male){
  if (g != "Combined" & third_variable == "gender"){return}
  df <- if (g == "Combined") df 
        else df %>% filter(gender == g)
  rows <- if (samplesize == "small") seq(1,small_sample_n) 
          else seq(small_sample_n + 1, both_sample_n)
  df <- bind_rows(
    df_male %>% filter(row_number() %in% rows), 
    df_female %>% filter(row_number() %in% rows))
  tryCatch({
    res <- analyze(vars, effect, df)
    tibble(!!glue("{g}_{samplesize}_{effect}_R2") := res$R2, 
           !!glue("{g}_{samplesize}_{effect}_P") := res$P)
    }, error = function(e) NULL
  )
}

add_row <- function(third_variable, df){
  vars <- if (third_variable == "gender") c(sample(ordinal_vars, 2), "gender") 
          else sample(ordinal_vars, 3)  
  dfu <- df %>% select(vars, gender) %>% na.omit
  df_female <- dfu %>% filter(gender == "Woman")
  df_male <- dfu %>% filter(gender == "Man")
  if (df_female %>% nrow > both_sample_n & df_male %>% nrow > both_sample_n){
    crossing(g = c("Combined", "Man", "Woman"), 
             samplesize = c("large", "small"), 
             effect = c("interaction", "main")) %>% 
    pmap(add_column, vars, third_variable, df_female, df_male) %>% 
      bind_cols %>% mutate(Third_variable=third_variable)
  }
}

add_trio <- function(df) pmap(list(c("ordinal", "gender")), add_row, df) %>% bind_rows

#rerun(1000000, add_trio(answers)) %>% bind_rows %>% write_tsv(glue("trios.tsv"))

df <- read_tsv("data/trios.tsv")

 

3. Example questions

questions %>% filter(Q %in% sample(ordinal_vars, 5)) %>% 
  select(text, option_1, option_2, option_3, option_4)
text option_1 option_2 option_3 option_4
Do you believe that because a person is overweight that it is because they are weak and without self control? Yes / Usually No I’m Not Sure -
How was your childhood? Wonderful! It was okay. Not great, but I’m no worse for the wear. Awful, and I have emotional issues as a result.
Do you enjoy creating mathematical problems to solve by yourself? Yes. No. - -
How many pillows do you sleep with at night? None. One. Two. Three or more.
Do you like to listen to audiobooks? Yes. No. I’ve never tried them. -


4. Results

Proportion of effects that replicate in the larger sample

prop <- df %>% group_by(Third_variable) %>% 
  summarise(
    Interactions = sum(
      Combined_small_interaction_P < 0.05 & Combined_large_interaction_P < 0.05, na.rm=T) / 
      sum(Combined_small_interaction_P < 0.05, na.rm=T),
    Main = sum(
      Combined_small_main_P < 0.05 & Combined_large_main_P < 0.05, na.rm=T) / 
      sum(Combined_small_main_P < 0.05, na.rm=T))

prop %>% gather(Effect, value, -Third_variable) %>% 
  ggplot(aes(x = Effect, y = value, fill=Third_variable)) +
  geom_bar(stat="identity", position="dodge", width=0.5, col="black") +
  ylab("Proportion replicated") + 
  scale_fill_manual(values=jcol)


Third_variable Interactions Main
gender 0.228 0.978
ordinal 0.0881 0.968

 

Replication of gender-specific effects

get_gender <- function(g1, df){
  g2 <- if (g1 == "Man") "Woman" else "Man"
  df %>% 
    filter(
      Combined_small_main_P > 0.05, 
      !!sym(glue("{g1}_small_main_P")) < 0.05, 
      !!sym(glue("{g2}_small_main_P")) > 0.05) %>% 
    summarise(
      gender = g1,
      only_gender = n(),
      repl_combined = sum(Combined_large_main_P < 0.05, na.rm=T),
      repl_females = sum(Woman_large_main_P < 0.05, na.rm=T),
      repl_males = sum(Man_large_main_P < 0.05, na.rm=T),
      repl_males_only = sum(Man_large_main_P < 0.05 & Woman_large_main_P > 0.05 & 
                            Combined_large_main_P > 0.05, na.rm=T),
      repl_females_only = sum(Woman_large_main_P < 0.05 & Man_large_main_P > 0.05 & 
                              Combined_large_main_P > 0.05, na.rm=T))
}

dfg <- pmap(list(c("Man", "Woman")), get_gender, df) %>% bind_rows

ymax_male <- dfg[1,2] %>% pull()
ymax_female <- dfg[2,2] %>% pull()

t3 <- dfg %>%   
  mutate(Combined = repl_combined / only_gender,
         Females = repl_females / only_gender,
         Males = repl_males / only_gender,
         Females_only = repl_females_only / only_gender,
         Males_only = repl_males_only / only_gender) %>%
  select(Combined, Females, Males, Females_only, Males_only, gender) %>% 
  gather(var, value, -gender) %>% 
  spread(gender, value)

p1 <- t3 %>% 
  ggplot(aes(var, Woman, fill=var)) + 
    geom_bar(stat="identity", fill="turquoise4", color="black") + 
    ylab("Proportion") + xlab("Replicated in") +
    ggtitle("Effects significant only in females") + 
    theme(axis.text.x = element_text(angle = 45, hjust=1),
          plot.title = element_text(size = 13)) + 
    ylim(0, 1)

p2 <- t3 %>% 
  ggplot(aes(var, Man)) + 
    geom_bar(stat="identity", fill="turquoise4", color="black") + 
    ylab("Proportion") + xlab("Replicated in") + 
    ggtitle("Effects significant only in males") + 
    theme(axis.text.x = element_text(angle = 45, hjust=1),
          plot.title = element_text(size = 14)) + 
    ylim(0, 1)

ggarrange(p1, p2, ncol=2)


gender only_gender repl_combined repl_females repl_males repl_males_only repl_females_only
Man 6200 5090 4210 4410 270 120
Woman 6880 5850 5290 4740 160 230


Description of effects

make_table <- function(df, size, effect){
  base <- glue("Combined_{size}_{effect}")
  df %>% group_by(Third_variable) %>% 
    mutate(total_n = n()) %>% 
    filter(!!sym(glue("{base}_P")) < 0.05) %>% 
    mutate(Proportion_significant = n() / total_n) %>% 
    summarise(
      Proportion_significant = mean(Proportion_significant),
      Mean_P_for_significant = mean(!!sym(glue("{base}_P"))), 
      Mean_R2_for_significant = ifelse(effect == "main", 
        mean(!!sym(glue("{base}_R2"))),
        mean(!!sym(glue("{base}_R2")) - !!sym(glue("Combined_{size}_main_R2"))))) %>% 
    pander(caption = glue("{size} sample - {effect} effects"))
}

make_table(df, "small", "main")
small sample - main effects
Third_variable Proportion_significant Mean_P_for_significant Mean_R2_for_significant
gender 0.523 0.00768 0.064
ordinal 0.376 0.0108 0.0537
make_table(df, "large", "main")
large sample - main effects
Third_variable Proportion_significant Mean_P_for_significant Mean_R2_for_significant
gender 0.91 0.00149 0.0333
ordinal 0.855 0.00253 0.0199
make_table(df, "small", "interaction")
small sample - interaction effects
Third_variable Proportion_significant Mean_P_for_significant Mean_R2_for_significant
gender 0.0502 0.0242 0.0192
ordinal 0.0334 0.0255 0.0221
make_table(df, "large", "interaction")
large sample - interaction effects
Third_variable Proportion_significant Mean_P_for_significant Mean_R2_for_significant
gender 0.138 0.018 0.00261
ordinal 0.0752 0.0223 0.0028


P-curves

plot_p <- function(df, var, title){
  ggplot(df, aes(x = !!sym(var))) + 
    geom_freqpoly(binwidth=0.005, size=1) +
    xlim(c(0.0, 0.05)) +
    ggtitle(title) + xlab("P-value") + 
    theme(plot.title = element_text(size = 13)) 
}

plot_comparison <- function(df1, df2, title){
  fig <- ggarrange(plot_p(df1, "Combined_small_interaction_P", "Interactions"), 
                   plot_p(df2, "Combined_small_main_P", "Main effects"), 
                   ncol=2, nrow=1, common.legend=T)
  annotate_figure(fig, top = text_grob(title, size = 14))
}

plot_comparison(df, df, "All effects")

plot_comparison(
  df %>% filter(Combined_large_interaction_P < 0.05), 
  df %>% filter(Combined_large_main_P < 0.05), 
  "Replicated effects")

plot_comparison(
  df %>% filter(Combined_large_interaction_P > 0.05), 
  df %>% filter(Combined_large_main_P > 0.05), 
  "Non-replicated effects")


Power

I arbitrarily define “true” effects as those with a p value below 0.001 in the large sample. What is the power to detect those in the small sample?

power <- df %>% group_by(Third_variable) %>% 
  summarise(
    Interactions = sum(
      Combined_small_interaction_P < 0.05 & Combined_large_interaction_P < 0.001, na.rm=T) / 
      sum(Combined_large_interaction_P < 0.001, na.rm=T),
    Main = sum(
      Combined_small_main_P < 0.05 & Combined_large_main_P < 0.001, na.rm=T) / 
      sum(Combined_large_main_P < 0.001, na.rm=T))

power %>% gather(Effect, value, -Third_variable) %>% 
  ggplot(aes(x = Effect, y = value, fill=Third_variable)) +
  geom_bar(stat="identity", position="dodge", width=0.5, col="black") +
  ylab("Power to identify in small sample") + 
  scale_fill_manual(values=jcol) +
  ylim(0, 1)

Pre-registered study.

What if you pick a random effect, and stick by it. This could be viewed as an assumption of no publication bias.

prop <- df %>% group_by(Third_variable) %>% 
  summarise(
    Interactions_not_replicated = sum(
      Combined_small_interaction_P < 0.05 & Combined_small_interaction_P > 0.01 & Combined_large_interaction_P > 0.05, na.rm=T) / 
      sum(Combined_small_interaction_P < 1.1, na.rm=T),
    Interactions_replicated = sum(
      Combined_small_interaction_P < 0.05 & Combined_small_interaction_P > 0.01  & Combined_large_interaction_P < 0.05, na.rm=T) / 
      sum(Combined_small_interaction_P < 1.1, na.rm=T),
    Main_not_replicated = sum(
      Combined_small_main_P < 0.05 & Combined_small_main_P > 0.01 & Combined_large_main_P > 0.05, na.rm=T) / 
      sum(Combined_small_main_P < 1.1, na.rm=T),
    Main_replicated = sum(
      Combined_small_main_P < 0.05 & Combined_small_main_P > 0.01 & Combined_large_main_P < 0.05, na.rm=T) / 
      sum(Combined_small_main_P < 1.1, na.rm=T))

prop %>% gather(Effect, value, -Third_variable) %>% 
  ggplot(aes(x = Effect, y = value, fill=Third_variable)) +
  geom_bar(stat="identity", position="dodge", width=0.5, col="black") +
  ylab("Proportion") + 
  scale_fill_manual(values=jcol)

df %>% filter(Third_variable == "ordinal", Combined_small_interaction_P < 0.05) %>% drop_na() %>% nrow()
## [1] 5960