Abstract
REFBANK TODO
This is currently an outline / pilot analysis with posterior predictive checks.
note: discuss both the other similar aggregation schemes have been helpful within the given topic – childes, wordbank, peekbank, etc
makes data reuse easier – make some things possible that otherwise wouldn’t have been
and that the aggregation makes it more helpful for further fields – an important first step for CS/AI benchmarking/applications or for task corpora in linguistics
(format and processing)
dataset normalization / transcription / target coding / etc (where are the places where we had to make guesses)
In here could discuss that this makes it not human subjects data b/c it doesn’t have identifiers
criteria for inclusion / choices to record exclusions but not apply them
things like word count, sbert, pos, other stuff discuss that for processing measures that include comparison across trials, need our own light exclusions
(+ how solicitation happened) this is where a table of included datasets & vague properties could be helpful!
how to access etc Shiny app / API / redivis versioning etc
Note: As a pilot to make sure we got the model specifications right (and that the models would run) we tested these on boyce2024 & hawkins2020.
All of these analyses are run only on “stage 1” data
There’s a question of what the correct functional form is for the words ~ rep_num relationship. We fit 4 options – the full 2x2 of log or raw words x log or raw rep num.
p_beta <- prior_string("normal(0,.5)", class = "b")
p_sd <- prior_string("normal(0,.5)", class = "sd")
p_intercept_logscale <- prior_string("normal(2,.5)", class = "Intercept")
p_intercept_linear <- prior_string("normal(10,10)", class = "Intercept")
p_beta_linear <- prior_string("normal(0,5)", class = "b")
p_sd_linear <- prior_string("normal(0,5)", class = "sd")
log_dv_priors <- c(p_intercept_logscale, p_beta, p_sd)
linear_dv_priors <- c(p_intercept_linear, p_beta_linear, p_sd_linear)
red_mod_log_log <- brm(log_words ~ log_rep_num + (log_rep_num || dataset_id / condition_id),
prior = log_dv_priors)
red_mod_log_lin <- brm(log_words ~ rep_num + (rep_num || dataset_id / condition_id),
prior = log_dv_priors
)
red_mod_lin_log <- brm(words ~ log_rep_num + (log_rep_num || dataset_id / condition_id),
prior = linear_dv_priors
)
red_mod_lin_lin <- brm(words ~ rep_num + (rep_num || dataset_id / condition_id),
prior = linear_dv_priors
)
Note: here and elsewhere, the panels and colors are just different conditions. We want to make sure that the random effects can capture condition differences, and we spread them across panels for viewability.
note that log words the actual data is spikey because number of words is discrete, so it can be 0 or log(2) or etc, but not in between.
## elpd_diff se_diff elpd_loo se_elpd_loo p_loo se_p_loo looic
## log_log 0.0 0.0 -91826.1 177.2 24.7 0.3 183652.1
## log_lin -66.7 25.2 -91892.7 177.2 25.5 0.3 183785.4
## lin_log -11092.3 232.8 -102918.3 274.5 40.1 1.2 205836.6
## lin_lin -11415.3 233.2 -103241.3 271.8 37.2 1.1 206482.6
## se_looic
## log_log 354.3
## log_lin 354.4
## lin_log 548.9
## lin_lin 543.6
## elpd_diff se_diff
## log_log 0.0 0.0
## log_lin -66.7 25.2
Log words fits way better, and seems right – whether it’s a lot or power law relationship is hard to tell.
Two possible approaches to moderators.
p_beta_linear <- prior_string("normal(0,.2)", class = "b")
log_lin_pred_mod <- brm(
slope ~ n_players +
# option_size +
# image_type +
# partner_constancy +
role_constancy +
# population +
# modality +
feedback +
backchannel,
prior = c(p_beta_linear),
data = log_lin_preds
)
(and same for log-log model)
For the full model, we’d have all the predictors, but we don’t have variation on many of them with just the two pilot datasets.
(and same for log-lin relationship) For the full model, we’d have all the predictors and all three models, but we don’t have variation on many of them with just the two pilot datasets.
p_intercept_logscale <- prior_string("normal(2,.5)", class = "Intercept")
p_intercept_linear <- prior_string("normal(10,10)", class = "Intercept")
p_beta_linear <- prior_string("normal(0,5)", class = "b")
p_sd_linear <- prior_string("normal(0,5)", class = "sd")
log_dv_priors <- c(p_intercept_logscale, p_beta, p_sd)
linear_dv_priors <- c(p_intercept_linear, p_beta_linear, p_sd_linear)
red_mod_log_log_participants <- brm(
log_words ~ log_rep_num *
# population*
n_players + (log_rep_num || dataset_id / condition_id),
prior = log_dv_priors,
)
# not run because no variation in pilot set
red_mod_log_log_images <- brm(
log_words ~ log_rep_num * (option_size + image_type) +
(log_rep_num || dataset_id / condition_id),
prior = log_dv_priors,
)
red_mod_log_log_channel <- brm(
log_words ~ log_rep_num * (role_constancy +
# modality+
feedback + backchannel) +
(log_rep_num || dataset_id / condition_id),
prior = log_dv_priors,
)
n-players, age-group (on stage 1 only, using n-players who are active at this point)
So, comparing for n-players
pilot samples doesn’t have age-group variation
thickness, modality
For role-constancy (yes):
For feedback (limited):
For backchannel (limited):
type of stims x n targets
no variation to compare in this set
Estimates between slope and full model with the same functional form are quite consistent – so which one should we use?
We have PoS data for all monolingual corpora.
After much model wrangling, Alvin and I got multinomial models for PoS working.
The best functional form relationship is using log(rep_num)
p_beta_pos <- prior_string("normal(0,1.5)", class = "b", dpar = c("muDET", "muFUNCTION", "muMODIFIER", "muNOUN", "muVERB"))
p_sd_pos <- prior_string("normal(0,1.5)", class = "sd", dpar = c("muDET", "muFUNCTION", "muMODIFIER", "muNOUN", "muVERB"))
p_intercept_pos <- prior_string("normal(0, 1.5)", class = "Intercept", dpar = c("muDET", "muFUNCTION", "muMODIFIER", "muNOUN", "muVERB"))
logistic_pos_priors <- c(p_beta_pos, p_sd_pos, p_intercept_pos)
per_describer_for_model <- read_rds(here("cached_model_files/data_for_mods/per_describer_for_model.rds")) |>
mutate(
condition_id = as.factor(condition_id),
total = NOUN + VERB + MODIFIER + FUNCTION + DET + PRON,
w = 1 / total
) |>
filter(total != 0)
pos_mod_log <- brm(
cbind(NOUN, VERB, MODIFIER, FUNCTION, DET, PRON) | trials(total) + weights(w) ~ log_rep_num +
(log_rep_num || dataset_id / condition_id),
family = multinomial(refcat = "PRON"),
prior = logistic_pos_priors,
)
We now have multilingual embeddings for all corpora. We look at the similarity to the next rep (same game, same target) as a measure of within game similiarity. We look at the cross-game (same condition, same target, same rep) dissimilarity.
The fit shown here isn’t great and we are currently running with log_rep_num instead. If that looks better, we will switch to that.
# Priors now on logit scale (ordbeta uses logit link)
# logit(0.72) ≈ 0.94, so intercept around 1
p_beta_sim <- prior_string("normal(0, 0.5)", class = "b")
p_sd_sim <- prior_string("normal(0, 0.5)", class = "sd")
p_intercept_sim <- prior_string("normal(1, 1.5)", class = "Intercept")
sim_priors <- c(p_intercept_sim, p_beta_sim, p_sd_sim)
sims_for_model <- read_rds(here("cached_model_files/data_for_mods/sims_for_model.rds"))
to_next_mod <- ordbetareg(sim ~ rep_num + (rep_num || dataset_id / condition_id),
manual_prior = sim_priors,
file = here("cached_model_files/mods/to_next_mod.rds"),
data = sims_for_model |> filter(sim_type == "to_next")
)
diverge_mod <- ordbetareg(sim ~ rep_num + (rep_num || dataset_id / condition_id),
manual_prior = sim_priors,
file = here("cached_model_files/mods/diverge_mod.rds"),
data = sims_for_model |> filter(sim_type == "diverge")
)
summary of what we did and results
commentary on that this data can be shared once it’s transcribed b/c its not human subjects data & that data reuse is one of our favorite things (so, y’know, give us more datasets!) (could crib from the peekbank behavioral methods paper for how they framed this!)
Useful for exploration and for future directions for new data collection (ex. multi-lingual, or cleanly comparable modality or whatever we want to point towards)