Init

We use the rms package because 1) it supports ordinal regression, 2) automatically calculates R2 or pseudo R2 for us.

options(digits = 3, contrasts=c("contr.treatment", "contr.treatment")) #setting demanded by rms
library(pacman)
p_load(kirkegaard, readr, rms, gtools)

#ad hoc variant of describe
describe = function(x) {
  y = psych::describe(x)
  class(y) = "data.frame"
  y
}

#random model fit function
random_fitter = function(..., sex = F) {

  #sample 3 vars
  vars = sample(ordinal_vars, 3)
  
  #if sex, use that as third
  if (sex) vars[3] = "sex"
  
  #make formulas
  main_effects_model = sprintf("%s ~ %s + %s", vars[1], vars[2], vars[3]) %>% as.formula()
  interaction_model = sprintf("%s ~ %s * %s", vars[1], vars[2], vars[3]) %>% as.formula()
  
  #try to fit, but we might fail for various odd reasons
  y = tryCatch({
    #fit models
    silence({main_effects_fit = rms::lrm(main_effects_model, data = d, penalty = .01, tol = 1e-22)})
    silence({interaction_fit = rms::lrm(interaction_model, data = d, penalty = .01, tol = 1e-22)})
    
    #list return
    list(
      main_effects_fit = main_effects_fit,
      interaction_fit = interaction_fit
      )
  }, 
  error = function(e) NULL
  )
  
  y
}

#get summary stats function
extract_summary_stats = function(l) {
  #extract summary stats
  data_frame(
    main_effects_only = l$main_effects_fit$stats["R2"],
    interaction_included = l$interaction_fit$stats["R2"],
    delta_R2 = interaction_included - main_effects_only,
    delta_R2_pct = (interaction_included / main_effects_only) - 1
    )
}

We use the OKCupid dataset which is very large and very diverse.

d = read_rds("data/parsed_data.rds")
dict = read_csv2("data/question_data.csv")

## Using ',' as decimal and '.' as grouping mark. Use read_delim() for more control.

## Warning: Missing column names filled in: 'X1' [1]

## Parsed with column specification:
## cols(
##   X1 = col_character(),
##   text = col_character(),
##   option_1 = col_character(),
##   option_2 = col_character(),
##   option_3 = col_character(),
##   option_4 = col_character(),
##   N = col_integer(),
##   Type = col_character(),
##   Order = col_character(),
##   Keywords = col_character()
## )

Main analysis

First we try models with 3 random ordinal variables. Before this, we subset to ordinal variables with at least 5,000 sample size. This avoids most of the overfitting. Modeling strategy:

Pick 3 random ordinal variables.
Make a model that has the first varible as the outcome and the two others as predictors. Make a second model identical to the first except we add the interaction term.
Fit the models, allowing for errors to occur, which we ignore.
Save the R2 fit statistic from each model pair. Calculate the absolute and relative gain in R2 for the modelsThis value cannot be below 0 because R2s are monotonically increasing metrics.
Summarize and plot the differences.

#seed
set.seed(1)

#subset of ordinals
ordinal_vars = dict %>% filter(Type == "O", N >= 5000) %>% arrange(-N) %>% .[["X1"]]

#random trios for analysis
#load from disk if available
random_ordinal_trios = cache_object(expr = map(1:10000, random_fitter),
                                    filename = "data/random_ordinal_trios.rds"
                                    )

## Cache found, reading object from disk

#failed fits
failed_fits = map_lgl(random_ordinal_trios, is.null)
random_ordinal_trios = random_ordinal_trios[!failed_fits]
random_ordinal_trios %>% length()

## [1] 100

#extract summary statistics
fit_stats_trios = map_df(random_ordinal_trios, extract_summary_stats)

#average stats for gains
fit_stats_trios %>% describe()

#filter to models that explain at least 2% with main effects
fit_stats_trios %>% 
  filter(main_effects_only >= .02) %>% 
  describe()

#filter to models that explain at least 5% with main effects
fit_stats_trios %>% 
  filter(main_effects_only >= .05) %>% 
  describe()

Note that only a small number of the many models fit actually passed the thresholds of explaining at least 2 or 5% variance. For these, the gains from adding interaction terms was very small on average with medians of 0% gain in R2.

Sex as moderator

#recode sex as binary
table(d$gender)

## 
##   Man Other Woman 
## 40215   198 25952

d$sex = d$gender %>% plyr::mapvalues(from = "Other", to = NA)

#random trios for analysis
random_ordinal_trios_sex = cache_object(expr = map(1:10000, random_fitter, sex = TRUE),
                                        filename = "data/random_ordinal_trios_sex.rds"
                                        )

## Cache found, reading object from disk

#failed fits
failed_fits = map_lgl(random_ordinal_trios_sex, is.null)
random_ordinal_trios_sex = random_ordinal_trios_sex[!failed_fits]
random_ordinal_trios_sex %>% length()

## [1] 100

#extract summary statistics
fit_stats_trios_sex = map_df(random_ordinal_trios_sex, extract_summary_stats)

#average stats for gains
fit_stats_trios_sex %>% describe()

#filter to models that explain at least 2% with main effects
fit_stats_trios_sex %>% 
  filter(main_effects_only >= .02) %>% 
  describe()

#filter to models that explain at least 5% with main effects
fit_stats_trios_sex %>% 
  filter(main_effects_only >= .05) %>% 
  describe()

Results very similar to above.

Conclusion

Generally speaking, interaction effects did not add much validity to analyses in randomly chosen models. This suggests that the general effect size prior on interaction effects in social science is quite low. The poor replicability of interaction effects in the scientific literature across many fields testify to the general low prior of interaction effects, not just in the social sciences. In theory, human intuition can improve the prior by only testing for likely interactions, but meta-scientific research shows that most interaction testing is likely to be exploratory and hence subject to the very strong prior of near zero effect size. This was also true for the sex interaction effect sizes, which is probably the most commonly tested interaction in all of science.

The bottom line is that considerable skepticism is warranted when presented with a claim of a strong interaction effect This suggests that one should require a much more stringest p value threshold for discovery claims to avoid an excessive false positive rate and extreme effect size overestimation.

How important are interactions for human psychology? A large dataset analysis

Emil O. W. Kirkegaard

Init

Main analysis

Sex as moderator

Conclusion