Bayesian power analysis

Generate Data

Participants will be 120 children (after exclusions) with 60 infants from Stanford, CA, United States and 60 infants from the Chinese University of Hong Kong in Hong Kong, China. We will recruit infants between 12-24 months of age, stratified such that 30 children will be between 12;0 - 17;31 months of age and 30 children will be between 18;0 - 23;31 at each site. For our Bayesian power analysis, we will generate datasets corresponding to this design. We will estimate power to detect a moderate effect size of culture (Cohen’s d = 0.5).

n = 60 # participants per group (culture)
effect_size = .5 # assume a medium effect size

sim_d <- function(seed, n) {
  # define the means for standardized DVs
  mu_t <- effect_size
  mu_c <- 0
  set.seed(seed)
  
  young = runif(n, -1, 0)
  older = runif(n, 0, 1)

  d <- tibble(id = seq(1:(2*n)), 
            age = c(young[1:(n/2)], older[1:(n/2)], young[((n/2)+1):n], older[((n/2)+1):n]),
            group = rep(c("culture1", "culture2"), each = n)) %>% 
      mutate(culture = ifelse(group == "culture1", 0, 1),
          y = ifelse(group == "culture1", 
                        rnorm(n, mean = mu_c, sd = 1),
                        rnorm(n, mean = mu_t, sd = 1)))
  return(d)
}

d <- sim_d(123, 60)

# get default brms prior
get_prior(data = d, family = gaussian,
          y ~ 0 + intercept + age*culture)

##                 prior class        coef group resp dpar nlpar bound
## 1                         b                                        
## 2                         b         age                            
## 3                         b age:culture                            
## 4                         b     culture                            
## 5                         b   intercept                            
## 6 student_t(3, 0, 10) sigma

# random intercepts per subject or not?
#get_prior(data = d, family = gaussian,
#          y ~ 0 + intercept + age*culture + (1|id))

The main analysis will entail six separate regressions, analyzing the proportion of relational play, frequency of joint attention, and level of parental support separately for the individual vs. guided play phases. The independent variables variables are age and culture. Let’s first run one Bayesian regression on our first generated dataset and see how it looks.

fit <- brm(data = d,
        family = gaussian,
        y ~ 0 + intercept + age * culture,
        prior = c(prior(normal(0, 10), class = b),
                  prior(student_t(3, 0, 10), class = sigma)),
        seed = 1, silent=T)

## Compiling the C++ model

## Start sampling

tidy(fit, prob = .95)

##            term      estimate  std.error        lower        upper
## 1   b_intercept   -0.03889714 0.12604184   -0.2830970    0.2074505
## 2         b_age   -0.01879263 0.23470800   -0.4747172    0.4424618
## 3     b_culture    0.53419878 0.17957605    0.1920023    0.8949431
## 4 b_age:culture    0.37511153 0.31737778   -0.2469116    0.9906630
## 5         sigma    0.97806839 0.06439057    0.8652806    1.1164673
## 6          lp__ -182.48986615 1.62878138 -186.5515612 -180.3693312

plot(fit)

Now simulate more experiments.

Run Simulations

#s %>% 
#  mutate(treatment = map(fit, tidy, prob = .95)) %>% 
#  unnest(treatment) %>% 
#  filter(term == "b_culture") %>% 
#  ggplot(aes(x = seed, y = estimate, ymin = lower, ymax = upper)) +
#  geom_hline(yintercept = c(0, .5), color = "white") +
#  geom_pointrange(fatten = 1/2) +
#  labs(x = "seed (i.e., simulation index)",
#       y = expression(beta[culture]))

s %>% 
  unnest(tidy) %>% 
  ggplot(aes(x = seed, y = estimate, ymin = lower, ymax = upper)) +
  geom_pointrange(fatten = 1/2, alpha=.7) +
  geom_hline(yintercept = c(0, .5), color = "white") +
  labs(x = "seed (i.e., simulation index)",
       y = expression(beta[culture]))

Power

power = s %>% 
  #mutate(group = map(fit, tidy, prob = .95)) %>% 
  unnest(tidy) %>% 
  #filter(term == "b_culture") %>% 
  mutate(check = ifelse(lower > 0, 1, 0)) %>% 
  summarise(power = mean(check))

powerpct = unlist(power*100)

In 77.9% of our 1000 simulations, a group size of 60 per culture was sufficient to produce a 95% Bayesian credible interval that did not straddle 0.

Bayesian power analysis

George Kachergis

2019-09-28

Generate Data

Run Simulations

Power