mb6-power-analysis

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.2
✔ ggplot2   4.0.0     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(broom)
library(lme4)

Loading required package: Matrix

Attaching package: 'Matrix'

The following objects are masked from 'package:tidyr':

    expand, pack, unpack

library(broom.mixed)
theme_set(theme_bw())

Different interaction profiles

null_effect <- tribble(
  ~kid, ~adult, ~value,
  "TP", "TP", 5,
  "TP", "MO", 5,
  "MO", "TP", 3,
  "MO", "MO", 3
)

prefer_TP <- tribble(
  ~kid, ~adult, ~value,
  "TP", "TP", 5,
  "TP", "MO", 3,
  "MO", "TP", 3,
  "MO", "MO", 2
)

arousal <- tribble(
  ~kid, ~adult, ~value,
  "TP", "TP", 5,
  "TP", "MO", 3,
  "MO", "TP", 2,
  "MO", "MO", 2
)

arousal_2 <- tribble(
  ~kid, ~adult, ~value,
  "TP", "TP", 5,
  "TP", "MO", 3,
  "MO", "TP", 4,
  "MO", "MO", 4
)

match_1 <- tribble(
  ~kid, ~adult, ~value,
  "TP", "TP", 5,
  "TP", "MO", 3,
  "MO", "TP", 3,
  "MO", "MO", 5
)

match_2 <- tribble(
  ~kid, ~adult, ~value,
  "TP", "TP", 5,
  "TP", "MO", 4,
  "MO", "TP", 1,
  "MO", "MO", 3
)

mismatch <- tribble(
  ~kid, ~adult, ~value,
  "TP", "TP", 3,
  "TP", "MO", 5,
  "MO", "TP", 5,
  "MO", "MO", 3
)
null_effect |> ggplot(aes(x=adult, y=value, color=kid))+geom_point()+geom_line(aes(group=c(kid)))+
  coord_cartesian(ylim=c(0,6))

prefer_TP |> ggplot(aes(x=adult, y=value, color=kid))+geom_point()+geom_line(aes(group=c(kid)))+
  coord_cartesian(ylim=c(0,6))

arousal |> ggplot(aes(x=adult, y=value, color=kid))+geom_point()+geom_line(aes(group=c(kid)))+
  coord_cartesian(ylim=c(0,6))

arousal_2|> ggplot(aes(x=adult, y=value, color=kid))+geom_point()+geom_line(aes(group=c(kid)))+
  coord_cartesian(ylim=c(0,6))

match_1 |> ggplot(aes(x=adult, y=value, color=kid))+geom_point()+geom_line(aes(group=c(kid)))+
  coord_cartesian(ylim=c(0,6))

match_2 |> ggplot(aes(x=adult, y=value, color=kid))+geom_point()+geom_line(aes(group=c(kid)))+
  coord_cartesian(ylim=c(0,6))

mismatch |> ggplot(aes(x=adult, y=value, color=kid))+geom_point()+geom_line(aes(group=c(kid)))+
  coord_cartesian(ylim=c(0,6))

What does neg-binomial look like

For data distribution, we’re using a negative binomial distribution which should be good for count data without the forced mean-variance link that poisson has.

Examples

sample <- expand_grid(mean = c(3), dispersion = c(.5, 1, 5, 10, 100), d = c(.2, .3, .4)) |>
  mutate(dat = pmap(list(mean, dispersion, d), \(m, disp, d) {
    var <- m + m**2 / disp
    sd <- sqrt(var)
    delta <- .5 * d * sqrt(var)
    m_lower <- m - delta
    dat_1 <- rnbinom(10000, size = disp, mu = m_lower)
    m_upper <- m + delta
    dat_2 <- rnbinom(10000, size = disp, mu = m_upper)
    tibble(dat = dat_1, type = "lower") |>
      bind_rows(tibble(dat = dat_2, type = "upper")) |>
      mutate(sd = sd)
  })) |>
  unnest(dat)


sample_2 <- expand_grid(mean = c(1,3,9), dispersion = c(.5, 1, 10, 100)) |>
  mutate(dat = pmap(list(mean, dispersion), \(m, disp) {
    dat <- rnbinom(10000, size = disp, mu = m)
    tibble(dat = dat)
  })) |>
  unnest(dat)

here’s an example of what these distributions look like for different cohen’s d, assuming mean of 3 and trying different dispersion values.

sample |>
  ggplot(aes(x = dat, fill = type)) +
  geom_histogram(binwidth = 1, alpha = .5, position = position_identity()) +
  facet_grid(dispersion ~ d) +
  coord_cartesian(xlim = c(0, 20))

sample_2 |>
  ggplot(aes(x = dat)) +
  geom_histogram(binwidth = 1, alpha = .5, position = position_identity()) +
  facet_grid(str_c("mean=",mean)~str_c("dispersion=",dispersion)) +
  coord_cartesian(xlim = c(0, 20))+
  labs(x="Number of events", y="")+theme(axis.title.y=element_blank(), axis.ticks.y=element_blank(),
                                         axis.text.y=element_blank())

also, as a check, we can recover roughly the same cohen’s d (using the normal calculation method for pooling sd, but we’re at least ballpark okay). This does assume independence.

check <- sample |>
  group_by(type, dispersion, d, sd) |>
  summarize(m_calc = mean(dat), sd_calc = sd(dat)) |>
  pivot_wider(names_from = type, values_from = c(m_calc, sd_calc)) |>
  mutate(
    pooled_sd = sqrt((sd_calc_lower**2 + sd_calc_upper**2) / 2),
    d_calc = (m_calc_upper - m_calc_lower) / pooled_sd
  )

`summarise()` has grouped output by 'type', 'dispersion', 'd'. You can override
using the `.groups` argument.

check |> ggplot(aes(x = d, y = d_calc, color = as.character(dispersion))) +
  geom_point() +
  geom_abline()

Simulation

to start with, let’s just simulate one set of data and explore it.

Parameter selections: we need to choose values for these, for a first attempt these are some guesses.

(experiment size)

number of labs: 10
number of infants / lab / age: 16

(distributional)

mean: 3
dispersion: 1 (looks kinda reasonable from above)
max value: 15 (since we’re going to be pretty dispersed, we’re going to replace values above 15 with 15)

(fixed effects, all in units of SD)

interaction (match/mismatch) effect: .4
simple effect (A > B for child): .4
simple effect (A > B for adult): 0
age main effect: .2
age x interaction: .2
age x simple effect: .2

(random effects, all in units of SD)

sd of labs (intercept): .2
sd of labs (adult): .1
sd of labs (child): 0
sd of labs (adult x child): .2
sd of labs (age): 0
sd of labs (age x adult): 0
sd of labs (age x child): 0
sd of labs (age x adult x child): 0
sd of infants (intercept): .2

sample_neg_binom <- function(effect, mean, dispersion, max) {
  # assumes effect in units of sd
  # mean and dispersion as parameters for nbinom
  # max is the max value to return -- samples above this are replaced with max
  # returns a sampled value
  var <- mean + mean**2 / dispersion
  delta <- effect * sqrt(var)
  new_mean <- max(mean + delta, 0)
  test_val <- rnbinom(n = 1, size = dispersion, mu = new_mean)
  ifelse(test_val > max, max, test_val)
}

# set a bunch of params
mean_nbin <- 3
dispersion <- 1
max_nbin <- 15

child_adult_fx <- .4
adult_fx <- 0
child_fx <- .4
age_fx <- .2
age_adult_fx <- 0
age_child_fx <- .2
age_child_adult_fx <- .2

sd_lab_intercept <- .2
sd_lab_child_adult_fx <- .2
sd_lab_adult_fx <- .1
sd_lab_child_fx <- 0
sd_lab_age_fx <- 0
sd_lab_age_adult_fx <- 0
sd_lab_age_child_fx <- 0
sd_lab_age_child_adult_fx <- 0

sd_child_intercept <- .2

setup_fixed_fx_only <- expand_grid(
  site = 1:10, # 10 sites
  kid = 1:16, # 16 kids / age / site
  kid_age = c("younger", "older"), # 2 age bins
  adult_did = c("adult_A", "adult_B"), # two behaviors
  kid_did = c("kid_A", "kid_B"), # two behaviors
) |>
  mutate(kid = str_c("lab_", site, "_", kid_age, "_", kid)) |>
  mutate(
    adult_did_num = ifelse(adult_did == "adult_A", -.5, .5), # contrast code
    kid_did_num = ifelse(kid_did == "kid_A", -.5, .5),
    kid_age_num = ifelse(kid_age == "younger", -.5, .5)
  ) |>
  # use params above to set all the fixed effects
  mutate(
    main_effects = child_fx * kid_did_num +
      adult_fx * adult_did_num +
      child_adult_fx * kid_did_num * adult_did_num,
    age_effects = age_fx * kid_age_num +
      age_child_fx * kid_age_num * kid_did_num,
    age_adult_fx * kid_age_num * adult_did_num,
    age_child_adult_fx * kid_age_num * kid_did_num * adult_did_num,
  )

set.seed(1)
setup_mixed_fx <- setup_fixed_fx_only |>
  group_by(site) |>
  # for each site, sample values for that lab
  mutate(
    lab_intercept = rnorm(n = 1, mean = 0, sd = sd_lab_intercept),
    lab_child_adult_fx = rnorm(n = 1, mean = 0, sd = sd_lab_child_adult_fx),
    lab_adult_fx = rnorm(n = 1, mean = 0, sd = sd_lab_adult_fx),
    lab_child_fx = rnorm(n = 1, mean = 0, sd = sd_lab_child_fx),
    lab_age_fx = rnorm(n = 1, mean = 0, sd = sd_lab_age_fx),
    lab_age_adult_fx = rnorm(n = 1, mean = 0, sd = sd_lab_age_adult_fx),
    lab_age_child_fx = rnorm(n = 1, mean = 0, sd = sd_lab_age_child_fx),
    lab_age_child_adult_fx = rnorm(n = 1, mean = 0, sd = sd_lab_age_child_adult_fx)
  ) |>
  group_by(kid) |>
  # for each child, sample values for that child
  mutate(child_intercept = rnorm(n = 1, mean = 0, sd = sd_child_intercept)) |>
  ungroup() |>
  # add up the random effects
  mutate(
    random_effects = child_intercept +
      lab_intercept +
      lab_child_fx * kid_did_num +
      lab_adult_fx * adult_did_num +
      lab_child_adult_fx * kid_did_num * adult_did_num +
      lab_age_fx * kid_age_num +
      lab_age_child_fx * kid_age_num * kid_did_num,
    lab_age_adult_fx * kid_age_num * adult_did_num,
    lab_age_child_adult_fx * kid_age_num * kid_did_num * adult_did_num,
    overall_effect = main_effects + age_effects + random_effects
  )


set.seed(1)
sample_fixed_effects <- setup_fixed_fx_only |>
  rowwise() |>
  mutate(sample = sample_neg_binom(main_effects + age_effects, mean = mean_nbin, dispersion = dispersion, max = max_nbin))


set.seed(1)
sample_mixed_effects <- setup_mixed_fx |>
  rowwise() |>
  mutate(sample = sample_neg_binom(overall_effect, mean = mean_nbin, dispersion = dispersion, max = max_nbin))

viz sampling with fixed effects only

sample_fixed_effects |> ggplot(aes(x = sample)) +
  geom_histogram(binwidth = 1) +
  geom_vline(aes(xintercept = mean(sample)))

sample_fixed_effects |>
  group_by(adult_did, kid_did) |>
  mutate(mean = mean(sample)) |>
  ggplot(aes(x = sample)) +
  geom_histogram(binwidth = 1, position = "identity", alpha = .3) +
  geom_vline(aes(xintercept = mean)) +
  facet_grid(adult_did ~ kid_did)

sample_fixed_effects |>
  group_by(adult_did, kid_did, kid_age) |>
  mutate(mean = mean(sample)) |>
  ggplot(aes(x = sample, color = kid_age, fill = kid_age)) +
  geom_histogram(binwidth = 1, position = "identity", alpha = .3) +
  geom_vline(aes(xintercept = mean, color = kid_age)) +
  facet_grid(adult_did ~ kid_did)

sample_fixed_effects |>
  group_by(adult_did, kid_did, kid_age) |>
  mutate(mean = mean(sample)) |>
  ggplot(aes(x = sample, color = interaction(kid_did, adult_did), fill = interaction(kid_did, adult_did))) +
  geom_density(position = "identity", alpha = .3) +
  geom_vline(aes(xintercept = mean, color = interaction(kid_did, adult_did))) +
  facet_grid(~kid_age)

sample_fixed_effects |>
  mutate(match = str_sub(kid_did, -1, -1) == str_sub(adult_did, -1, -1)) |>
  group_by(kid_age, match) |>
  mutate(mean = mean(sample)) |>
  ggplot(aes(x = sample, color = match, fill = match)) +
  geom_histogram(binwidth = 1, position = "identity", alpha = .3) +
  geom_vline(aes(xintercept = mean, color = match)) +
  facet_grid(~kid_age)

sample_fixed_effects |> ggplot(aes(x = reorder(site, sample), y = sample, color = kid_age)) +
  geom_jitter(alpha = .1) +
  stat_summary()

No summary function supplied, defaulting to `mean_se()`

sample_fixed_effects |> ggplot(aes(x = reorder(site, sample), y = sample, color = interaction(adult_did, kid_did))) +
  geom_jitter(alpha = .1) +
  stat_summary()

No summary function supplied, defaulting to `mean_se()`

sample_fixed_effects |> ggplot(aes(x = reorder(kid, sample), y = sample)) +
  geom_jitter(alpha = .1) +
  stat_summary(geom = "point") +
  theme(axis.ticks.x = element_blank(), axis.text.x = element_blank(), panel.grid = element_blank())

No summary function supplied, defaulting to `mean_se()`

viz sampling with mixed effects

sample_mixed_effects |> ggplot(aes(x = sample)) +
  geom_histogram(binwidth = 1) +
  geom_vline(aes(xintercept = mean(sample)))

sample_mixed_effects |>
  group_by(adult_did, kid_did) |>
  mutate(mean = mean(sample)) |>
  ggplot(aes(x = sample)) +
  geom_histogram(binwidth = 1, position = "identity", alpha = .3) +
  geom_vline(aes(xintercept = mean)) +
  facet_grid(adult_did ~ kid_did)

sample_mixed_effects |>
  group_by(adult_did, kid_did, kid_age) |>
  mutate(mean = mean(sample)) |>
  ggplot(aes(x = sample, color = kid_age, fill = kid_age)) +
  geom_histogram(binwidth = 1, position = "identity", alpha = .3) +
  geom_vline(aes(xintercept = mean, color = kid_age)) +
  facet_grid(adult_did ~ kid_did)

sample_mixed_effects |>
  group_by(adult_did, kid_did, kid_age) |>
  mutate(mean = mean(sample)) |>
  ggplot(aes(x = sample, color = interaction(kid_did, adult_did), fill = interaction(kid_did, adult_did))) +
  geom_density(position = "identity", alpha = .3) +
  geom_vline(aes(xintercept = mean, color = interaction(kid_did, adult_did))) +
  facet_grid(~kid_age)

sample_mixed_effects |>
  mutate(match = str_sub(kid_did, -1, -1) == str_sub(adult_did, -1, -1)) |>
  group_by(kid_age, match) |>
  mutate(mean = mean(sample)) |>
  ggplot(aes(x = sample, color = match, fill = match)) +
  geom_histogram(binwidth = 1, position = "identity", alpha = .3) +
  geom_vline(aes(xintercept = mean, color = match)) +
  facet_grid(~kid_age)

(look at random effects)

sample_mixed_effects |> ggplot(aes(x = reorder(site, sample), y = sample, color = kid_age)) +
  geom_jitter(alpha = .1) +
  stat_summary()

No summary function supplied, defaulting to `mean_se()`

sample_mixed_effects |> ggplot(aes(x = reorder(site, sample), y = sample, color = interaction(adult_did, kid_did))) +
  geom_jitter(alpha = .1) +
  stat_summary()

No summary function supplied, defaulting to `mean_se()`

sample_mixed_effects |> ggplot(aes(x = reorder(kid, sample), y = sample)) +
  geom_jitter(alpha = .1) +
  stat_summary(geom = "point") +
  theme(axis.ticks.x = element_blank(), axis.text.x = element_blank(), panel.grid = element_blank())

No summary function supplied, defaulting to `mean_se()`

stats

note: some models not represented here are

mixed effects models for each behavior separately
time data for mouth open mixed effect model (which would need separate simulation)

fixed effect model

fix_fx_mod <- MASS::glm.nb(sample ~ kid_age_num * adult_did_num * kid_did_num, data = sample_mixed_effects)

summary(fix_fx_mod)


Call:
MASS::glm.nb(formula = sample ~ kid_age_num * adult_did_num * 
    kid_did_num, data = sample_mixed_effects, init.theta = 0.8869014171, 
    link = log)

Coefficients:
                                       Estimate Std. Error z value Pr(>|z|)    
(Intercept)                            1.076864   0.034032  31.643  < 2e-16 ***
kid_age_num                            0.149845   0.068064   2.202   0.0277 *  
adult_did_num                         -0.131729   0.068064  -1.935   0.0529 .  
kid_did_num                            0.379775   0.068064   5.580 2.41e-08 ***
kid_age_num:adult_did_num             -0.003353   0.136129  -0.025   0.9803    
kid_age_num:kid_did_num                0.135490   0.136129   0.995   0.3196    
adult_did_num:kid_did_num              0.705511   0.136129   5.183 2.19e-07 ***
kid_age_num:adult_did_num:kid_did_num -0.189638   0.272257  -0.697   0.4861    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Negative Binomial(0.8869) family taken to be 1)

    Null deviance: 1476.9  on 1279  degrees of freedom
Residual deviance: 1412.7  on 1272  degrees of freedom
AIC: 5736.1

Number of Fisher Scoring iterations: 1

              Theta:  0.8869 
          Std. Err.:  0.0514 

 2 x log-likelihood:  -5718.0620

(reminder: “true” values

child_adult_fx = .4
adult_fx = 0
child_fx = .4
age_fx = .2
age_adult_fx=0
age_child_fx=.2
age_child_adult_fx=.2)

mixed effect model

note that there are some convergence issues, so we may need to decide which are the most important random slopes!

# mix_fx_mod <- glmer.nb(sample~kid_age_num*adult_did_num*kid_did_num+
#                          (1|kid)+
#                          (kid_age_num*adult_did_num*kid_did_num|site), data=sample_mixed_effects)

# ^ failed to converge

mix_fx_mod <- glmer.nb(sample ~ kid_age_num * adult_did_num * kid_did_num +
  (1 | kid) +
  (adult_did_num | site), data = sample_mixed_effects)

boundary (singular) fit: see help('isSingular')

# get a singular fit, but at least it converged?
summary(mix_fx_mod)

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: Negative Binomial(0.9154)  ( log )
Formula: sample ~ kid_age_num * adult_did_num * kid_did_num + (1 | kid) +  
    (adult_did_num | site)
   Data: sample_mixed_effects

      AIC       BIC    logLik -2*log(L)  df.resid 
     5732      5799     -2853      5706      1267 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-0.8811 -0.7708 -0.3228  0.3641  5.9501 

Random effects:
 Groups Name          Variance  Std.Dev. Corr
 kid    (Intercept)   0.0007666 0.02769      
 site   (Intercept)   0.0320334 0.17898      
        adult_did_num 0.0002080 0.01442  1.00
Number of obs: 1280, groups:  kid, 320; site, 10

Fixed effects:
                                      Estimate Std. Error z value Pr(>|z|)    
(Intercept)                            1.05833    0.06991  15.138  < 2e-16 ***
kid_age_num                            0.15912    0.06765   2.352   0.0187 *  
adult_did_num                         -0.13419    0.06797  -1.974   0.0484 *  
kid_did_num                            0.39543    0.06811   5.806 6.41e-09 ***
kid_age_num:adult_did_num             -0.00336    0.13520  -0.025   0.9802    
kid_age_num:kid_did_num                0.12571    0.13539   0.929   0.3531    
adult_did_num:kid_did_num              0.70597    0.13568   5.203 1.96e-07 ***
kid_age_num:adult_did_num:kid_did_num -0.23434    0.27047  -0.866   0.3863    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) kd_g_n adlt__ kd_dd_ kd_g_nm:d__ kd_g_nm:k__ a__:__
kid_age_num -0.024                                                    
adult_dd_nm  0.072 -0.002                                             
kid_did_num -0.065  0.003 -0.050                                      
kd_g_nm:d__  0.007  0.023 -0.022  0.007                               
kd_g_nm:k__  0.010 -0.052  0.012 -0.025 -0.046                        
adlt_dd_:__ -0.047  0.012 -0.056  0.035  0.000      -0.007            
kd_g_:__:__ -0.005 -0.047 -0.001  0.000 -0.048       0.024      -0.016
optimizer (Nelder_Mead) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')

library(emmeans)

Welcome to emmeans.
Caution: You lose important information if you filter this package's results.
See '? untidy'

EMM <- emmeans(mix_fx_mod, ~adult_did_num*kid_did_num)

NOTE: Results may be misleading due to involvement in interactions

pairs(EMM, by="kid_did_num")

kid_did_num = -0.5:
 contrast                               estimate     SE  df z.ratio p.value
 (adult_did_num-0.5) - adult_did_num0.5    0.487 0.0987 Inf   4.936  <.0001

kid_did_num =  0.5:
 contrast                               estimate     SE  df z.ratio p.value
 (adult_did_num-0.5) - adult_did_num0.5   -0.219 0.0933 Inf  -2.345  0.0190

Results are averaged over the levels of: kid_age_num 
Results are given on the log (not the response) scale.

this seems to get reversed sign, but consistent values with the individual version, so we could skip the individual versions and just use the emmeans to get the two halves out.

individual mixed effects models

A is coded negative, so a negative coefficient on adult_did_num is supportive of imitation

mix_fx_mod_A <- glmer.nb(sample ~ kid_age_num * adult_did_num +
  (1 | kid) + (adult_did_num | site), data = sample_mixed_effects |> filter(kid_did=="kid_A"))

Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.0454932 (tol = 0.002, component 1)

boundary (singular) fit: see help('isSingular')

summary(mix_fx_mod_A)

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: Negative Binomial(0.8315)  ( log )
Formula: sample ~ kid_age_num * adult_did_num + (1 | kid) + (adult_did_num |  
    site)
   Data: filter(sample_mixed_effects, kid_did == "kid_A")

      AIC       BIC    logLik -2*log(L)  df.resid 
   2647.5    2687.6   -1314.7    2629.5       631 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-0.8342 -0.7380 -0.3067  0.3483  4.8496 

Random effects:
 Groups Name          Variance Std.Dev. Corr
 kid    (Intercept)   0.086555 0.29420      
 site   (Intercept)   0.060286 0.24553      
        adult_did_num 0.003232 0.05685  1.00
Number of obs: 640, groups:  kid, 320; site, 10

Fixed effects:
                          Estimate Std. Error z value Pr(>|z|)    
(Intercept)                 0.7806     0.1061   7.356 1.90e-13 ***
kid_age_num                 0.1143     0.1092   1.046    0.295    
adult_did_num              -0.5087     0.1067  -4.769 1.85e-06 ***
kid_age_num:adult_did_num   0.1145     0.2077   0.552    0.581    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) kd_g_n adlt__
kid_age_num -0.034              
adult_dd_nm  0.177 -0.016       
kd_g_nm:d__  0.019  0.063 -0.029
optimizer (Nelder_Mead) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')

B is coded positive, so a positive coefficient on adult_did_num is supportive of imitation

mix_fx_mod_B <- glmer.nb(sample ~ kid_age_num * adult_did_num +
  (1 | kid) + (adult_did_num | site), data = sample_mixed_effects |> filter(kid_did=="kid_B"))

boundary (singular) fit: see help('isSingular')

summary(mix_fx_mod_B)

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: Negative Binomial(1.0624)  ( log )
Formula: sample ~ kid_age_num * adult_did_num + (1 | kid) + (adult_did_num |  
    site)
   Data: filter(sample_mixed_effects, kid_did == "kid_B")

      AIC       BIC    logLik -2*log(L)  df.resid 
   3085.9    3126.0   -1533.9    3067.9       631 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-0.9265 -0.6754 -0.2774  0.3657  3.8263 

Random effects:
 Groups Name          Variance  Std.Dev.  Corr
 kid    (Intercept)   1.595e-11 3.994e-06     
 site   (Intercept)   2.841e-03 5.330e-02     
        adult_did_num 4.276e-04 2.068e-02 1.00
Number of obs: 640, groups:  kid, 320; site, 10

Fixed effects:
                          Estimate Std. Error z value Pr(>|z|)    
(Intercept)                1.26430    0.04753  26.598   <2e-16 ***
kid_age_num                0.21866    0.08769   2.494   0.0126 *  
adult_did_num              0.21917    0.08820   2.485   0.0130 *  
kid_age_num:adult_did_num -0.10168    0.17563  -0.579   0.5626    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) kd_g_n adlt__
kid_age_num -0.030              
adult_dd_nm  0.014  0.008       
kd_g_nm:d__  0.019 -0.028 -0.021
optimizer (Nelder_Mead) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')

chi-sq test

unclear what the age approach here is…

prep_chi_sq_data <- sample_mixed_effects |>
  group_by(kid, kid_age) |>
  mutate(condition = str_c(adult_did, "_", kid_did)) |>
  select(kid, kid_age, condition, sample) |>
  pivot_wider(names_from = condition, values_from = sample) |>
  mutate(
    A = case_when(
      adult_A_kid_A > adult_B_kid_A ~ "more_A",
      adult_A_kid_A == adult_B_kid_A ~ "equal_A",
      adult_A_kid_A < adult_B_kid_A ~ "less_A"
    ),
    B = case_when(
      adult_B_kid_B > adult_A_kid_B ~ "more_B",
      adult_B_kid_B == adult_A_kid_B ~ "equal_B",
      adult_B_kid_B < adult_A_kid_B ~ "less_B"
    )
  )

age as all together

chi_sq_data <- prep_chi_sq_data |>
  group_by(A, B) |>
  tally()

# overall goodness of fit test comparing to all equal
# note there are objections because 0's are not as likely
chisq.test(chi_sq_data$n, p = rep(1 / 9, 9))


    Chi-squared test for given probabilities

data:  chi_sq_data$n
X-squared = 121.79, df = 8, p-value < 2.2e-16

# both match versus both mismatch
match_mismatch <- chi_sq_data |> filter(A == "more_A" & B == "more_B" | A == "less_A" & B == "less_B")

# goodness of fit for match-match v mismatch-mismatch
chisq.test(match_mismatch$n, p = c(.5, .5))


    Chi-squared test for given probabilities

data:  match_mismatch$n
X-squared = 9.1513, df = 1, p-value = 0.002485

age as split and test each separately

chi_sq_data_age <- prep_chi_sq_data |>
  group_by(A, B, kid_age) |>
  tally()

# younger

# overall goodness of fit test comparing to all equal
# note there are objections because 0's are not as likely
chisq.test(chi_sq_data_age |> filter(kid_age == "younger") |> pull(n), p = rep(1 / 9, 9))


    Chi-squared test for given probabilities

data:  pull(filter(chi_sq_data_age, kid_age == "younger"), n)
X-squared = 65.45, df = 8, p-value = 3.932e-11

chisq.test(chi_sq_data_age |> filter(kid_age == "older") |> pull(n), p = rep(1 / 9, 9))


    Chi-squared test for given probabilities

data:  pull(filter(chi_sq_data_age, kid_age == "older"), n)
X-squared = 64.1, df = 8, p-value = 7.268e-11

# both match versus both mismatch
match_mismatch_age <- chi_sq_data_age |> filter(A == "more_A" & B == "more_B" | A == "less_A" & B == "less_B")

# goodness of fit for match-match v mismatch-mismatch
chisq.test(match_mismatch_age |> filter(kid_age == "younger") |> pull(n), p = c(.5, .5))


    Chi-squared test for given probabilities

data:  pull(filter(match_mismatch_age, kid_age == "younger"), n)
X-squared = 6, df = 1, p-value = 0.01431

chisq.test(match_mismatch_age |> filter(kid_age == "older") |> pull(n), p = c(.5, .5))


    Chi-squared test for given probabilities

data:  pull(filter(match_mismatch_age, kid_age == "older"), n)
X-squared = 3.4615, df = 1, p-value = 0.06281

wilcoxon matched-pairs signed-rank test

again, not clear what the intended age thing is, so try together & split

type_A <- sample_mixed_effects |>
  filter(kid_did == "kid_A") |>
  select(kid, kid_age, adult_did, sample) |>
  pivot_wider(names_from = adult_did, values_from = sample)

type_B <- sample_mixed_effects |>
  filter(kid_did == "kid_B") |>
  select(kid, kid_age, adult_did, sample) |>
  pivot_wider(names_from = adult_did, values_from = sample)

# within type-A, all ages
wilcox.test(type_A$adult_A, type_A$adult_B, paired = T)


    Wilcoxon signed rank test with continuity correction

data:  type_A$adult_A and type_A$adult_B
V = 22533, p-value = 2.115e-06
alternative hypothesis: true location shift is not equal to 0

# within type-A, younger only
wilcox.test(type_A |> filter(kid_age == "younger") |> pull(adult_A), type_A |> filter(kid_age == "younger") |> pull(adult_B), paired = T)


    Wilcoxon signed rank test with continuity correction

data:  pull(filter(type_A, kid_age == "younger"), adult_A) and pull(filter(type_A, kid_age == "younger"), adult_B)
V = 5399.5, p-value = 0.0006122
alternative hypothesis: true location shift is not equal to 0

# within type-A, older only
wilcox.test(type_A |> filter(kid_age == "older") |> pull(adult_A), type_A |> filter(kid_age == "older") |> pull(adult_B), paired = T)


    Wilcoxon signed rank test with continuity correction

data:  pull(filter(type_A, kid_age == "older"), adult_A) and pull(filter(type_A, kid_age == "older"), adult_B)
V = 5912, p-value = 0.001036
alternative hypothesis: true location shift is not equal to 0

# within type-B, all ages
wilcox.test(type_B$adult_A, type_A$adult_B, paired = T)


    Wilcoxon signed rank test with continuity correction

data:  type_B$adult_A and type_A$adult_B
V = 23894, p-value = 9.998e-09
alternative hypothesis: true location shift is not equal to 0

# within type-B, younger only
wilcox.test(type_B |> filter(kid_age == "younger") |> pull(adult_A), type_B |> filter(kid_age == "younger") |> pull(adult_B), paired = T)


    Wilcoxon signed rank test with continuity correction

data:  pull(filter(type_B, kid_age == "younger"), adult_A) and pull(filter(type_B, kid_age == "younger"), adult_B)
V = 4192.5, p-value = 0.1217
alternative hypothesis: true location shift is not equal to 0

# within type-B, older only
wilcox.test(type_B |> filter(kid_age == "older") |> pull(adult_A), type_A |> filter(kid_age == "older") |> pull(adult_B), paired = T)


    Wilcoxon signed rank test with continuity correction

data:  pull(filter(type_B, kid_age == "older"), adult_A) and pull(filter(type_A, kid_age == "older"), adult_B)
V = 6050, p-value = 4.56e-06
alternative hypothesis: true location shift is not equal to 0

what are priors for neg binomial

library(brms)

Loading required package: Rcpp

Loading 'brms' package (version 2.22.0). Useful instructions
can be found by typing help('brms'). A more detailed introduction
to the package is available through vignette('brms_overview').


Attaching package: 'brms'

The following object is masked from 'package:lme4':

    ngrps

The following object is masked from 'package:stats':

    ar

prior=c("normal(1,1)", class="Intercept",
        "normal(0,1)", class="b",
        "normal(0,1)", class="sd",
        "lkj(1)", class="cor",
        "inv_gamma(.4, .3)", class="shape")
mod <- brmsformula(sample ~ kid_age_num*adult_did_num*kid_did_num+(1|kid)+(kid_age_num*adult_did_num*kid_did_num|site), family=negbinomial(link="log"))

default_prior(mod, data=sample_mixed_effects)

                  prior     class                                  coef group
                 (flat)         b                                            
                 (flat)         b                         adult_did_num      
                 (flat)         b             adult_did_num:kid_did_num      
                 (flat)         b                           kid_age_num      
                 (flat)         b             kid_age_num:adult_did_num      
                 (flat)         b kid_age_num:adult_did_num:kid_did_num      
                 (flat)         b               kid_age_num:kid_did_num      
                 (flat)         b                           kid_did_num      
                 lkj(1)       cor                                            
                 lkj(1)       cor                                        site
 student_t(3, 0.7, 2.5) Intercept                                            
   student_t(3, 0, 2.5)        sd                                            
   student_t(3, 0, 2.5)        sd                                         kid
   student_t(3, 0, 2.5)        sd                             Intercept   kid
   student_t(3, 0, 2.5)        sd                                        site
   student_t(3, 0, 2.5)        sd                         adult_did_num  site
   student_t(3, 0, 2.5)        sd             adult_did_num:kid_did_num  site
   student_t(3, 0, 2.5)        sd                             Intercept  site
   student_t(3, 0, 2.5)        sd                           kid_age_num  site
   student_t(3, 0, 2.5)        sd             kid_age_num:adult_did_num  site
   student_t(3, 0, 2.5)        sd kid_age_num:adult_did_num:kid_did_num  site
   student_t(3, 0, 2.5)        sd               kid_age_num:kid_did_num  site
   student_t(3, 0, 2.5)        sd                           kid_did_num  site
    inv_gamma(0.4, 0.3)     shape                                            
 resp dpar nlpar lb ub       source
                            default
                       (vectorized)
                       (vectorized)
                       (vectorized)
                       (vectorized)
                       (vectorized)
                       (vectorized)
                       (vectorized)
                            default
                       (vectorized)
                            default
                  0         default
                  0    (vectorized)
                  0    (vectorized)
                  0    (vectorized)
                  0    (vectorized)
                  0    (vectorized)
                  0    (vectorized)
                  0    (vectorized)
                  0    (vectorized)
                  0    (vectorized)
                  0    (vectorized)
                  0    (vectorized)
                  0         default

unclear if Bayesian makes sense as a robustness effect, but need to choose priors if we do.

Future

Interpretation

mixed effects models

if I understand correctly, the predictions of imitation are in particular a cross-over (type) effect – i.e. that for each child behavior, it is higher in it’s own category than the other category. Notably, this will be a subset of the cases with a credible interaction effect in the mixed effect model.

So, we really need the two separate models / the emmeans for each half.

In our testing, may want to reparameterize how we feed in values! to distinguish / not have centered values

child does A>B:
effect of adult A>B on child A:
effect of adult A>B on child B:

(these three with age)

chisq

the 3x3 chisq is not very sensitive (because we don’t know how many 0s to expect)
the pairwise – harder to think about, might depend on variances? definitely will want to simulate

kids generally do lots of A-A, some B-A, equal A-B and B-B (A is more arousing to watch, kid_A indexes arousal, kid_B does not) –> 2/3 of kids do A-A > B-A; 1/2 of kids do A-B, half of kids do B-B (even distributions) A-A + B-B : 2/3 * 1/2 B-A + A-B : 1/3 * 1/2 will come out? as AA-BB >> BA-AB

I think this again detects (some?) interaction effects, but doesn’t distinguish interactions that would get different interpretations.

wilcoxon

I think this is fine? again depends on what we do with age and the imitation theory needs both kid_A and kid_B tests to have the predicted direction.

What to use for wider effect-recovery / power-analysis simulations

The above tried one sampling on one set of params. Once we have a sense of

what parameter ranges are relevant and
what outcomes are relevant (ex. significance on which tests/contrasts)

I can do power-analysis / aggregate across many simulations.

May want to check ex. ostenberg or davis meta-analysis to get ideas on # of actions/ unit time (mean & max plausible) and estimate of variance from different sources and estimates of simple effects (how common overall are MO v TP).

will definitely want to check for a variety of combos of main effects/types of interaction
will want to check against different levels of background variability
may want to power-analyse for different expt sizes

Parameters:

(experiment size)

number of labs: 10
number of infants / lab / age: 16

(distributional)

mean: 3
dispersion: 1 (looks kinda reasonable from above)
max value: 15 (since we’re going to be pretty dispersed, we’re going to replace values above 15 with 15)

(fixed effects, all in units of SD)

interaction (match/mismatch) effect: .4
simple effect (A > B for child): .4
simple effect (A > B for adult): 0
age main effect: .2
age x interaction: .2
age x simple effect: .2

(random effects, all in units of SD)

sd of labs (intercept): .2
sd of labs (adult): .1
sd of labs (child): 0
sd of labs (adult x child): .2
sd of labs (age): 0
sd of labs (age x adult): 0
sd of labs (age x child): 0
sd of labs (age x adult x child): 0
sd of infants (intercept): .2