Bayesian power analysis

Generate Data

Participants will be 40-60 adults, assigned randomly to one of two conditions.

For our Bayesian power analysis, we will generate datasets corresponding to this design. We will estimate power to detect either a small effect size (Cohen’s d = 0.2) or a moderate effect size (Cohen’s d = 0.5).

n = 50 # participants per condition (change this, clear knitr cache, and re-run)

sim_d <- function(seed, n, effect_size) {
  # define the means for standardized DVs
  mu_t <- effect_size
  mu_c <- 0
  set.seed(seed)

  d <- tibble(id = seq(1:(2*n)), 
            group = rep(c("original", "rotated"), each = n)) %>% 
      mutate(condition = ifelse(group == "original", 0, 1),
          y = ifelse(group == "original", 
                        rnorm(n, mean = mu_c, sd = 1),
                        rnorm(n, mean = mu_t, sd = 1)))
  return(d)
}

sm_d <- sim_d(123, n, .2) # assume a small effect size (Cohen's d)
med_d <- sim_d(123, n, .5) # assume a medium effect size

# get default brms prior
get_prior(data = sm_d, family = gaussian,
          y ~ 0 + Intercept + condition) # could add

##                 prior class      coef group resp dpar nlpar bound
## 1                         b                                      
## 2                         b condition                            
## 3                         b Intercept                            
## 4 student_t(3, 0, 10) sigma

The independent variable is condition. Let’s first run one Bayesian regression on one generated dataset for each effect size and see how they look.

fit_sm <- brm(data = sm_d,
        family = gaussian,
        y ~ 0 + Intercept + condition,
        prior = c(prior(normal(0, 10), class = b),
                  prior(student_t(3, 0, 10), class = sigma)),
        seed = 123, silent=T)

## Compiling the C++ model

## Trying to compile a simple C file

## Start sampling

fit_med <- brm(data = med_d,
        family = gaussian,
        y ~ 0 + Intercept + condition,
        prior = c(prior(normal(0, 10), class = b),
                  prior(student_t(3, 0, 10), class = sigma)),
        seed = 123, silent=T)

## Compiling the C++ model

## recompiling to avoid crashing R session

## Trying to compile a simple C file

## Start sampling

The regression on the small effect size generated data:

tidy(fit_sm, prob = .89) # use 89% credible intervals instead of 95%: https://easystats.github.io/bayestestR/articles/credible_interval.html

##          term      estimate  std.error         lower        upper
## 1 b_Intercept    0.03252772 0.13140701   -0.18334691    0.2457552
## 2 b_condition    0.31520618 0.18573425    0.01218401    0.6168296
## 3       sigma    0.92863514 0.06656403    0.82847163    1.0371187
## 4        lp__ -142.74732915 1.24166549 -144.99730475 -141.4130950

The regression on the medium effect size generated data:

tidy(fit_med, prob = .89)

##          term      estimate  std.error        lower        upper
## 1 b_Intercept    0.02775248 0.13097629   -0.1880046    0.2345905
## 2 b_condition    0.62171648 0.18578410    0.3289054    0.9226805
## 3       sigma    0.92858571 0.06773839    0.8269248    1.0426463
## 4        lp__ -142.76023362 1.23119854 -145.0506624 -141.4207492

plot(fit_sm)

plot(fit_med)

Run Simulations

Now we’ll simulate many more experiments (for each assumed effect size), and see in how many experiments we get a significant result in our Bayesian regression.

s_sm %>% 
  unnest(tidy) %>% 
  ggplot(aes(x = reorder(seed, lower), y = estimate, ymin = lower, ymax = upper)) +
  geom_pointrange(fatten = 1/2, alpha=.7) +
  geom_hline(yintercept = c(0, .5), color = "white") +
  labs(x = "seed (i.e., simulation index)",
       y = expression(beta[condition]))

s_med %>% 
  unnest(tidy) %>% 
  ggplot(aes(x = reorder(seed, lower), y = estimate, ymin = lower, ymax = upper)) +
  geom_pointrange(fatten = 1/2, alpha=.7) +
  geom_hline(yintercept = c(0, .5), color = "white") +
  labs(x = "seed (i.e., simulation index)",
       y = expression(beta[condition]))

Power

power_sm = s_sm %>% 
  unnest(tidy) %>% 
  mutate(check = ifelse(lower > 0, 1, 0)) %>% 
  summarise(power = mean(check))

powerpct_sm = unlist(power_sm*100)

power_med = s_med %>% 
  unnest(tidy) %>% 
  mutate(check = ifelse(lower > 0, 1, 0)) %>% 
  summarise(power = mean(check))

powerpct_med = unlist(power_med*100)

In 31% of our 100 simulations of a small effect size (d=.2), a group size of 50 per condition was sufficient to produce a 89% Bayesian credible interval that did not straddle 0.

In 76% of our 100 simulations of a medium effect size (d=.5), a group size of 50 per condition was sufficient to produce a 89% Bayesian credible interval that did not straddle 0.

(For n=20 per condition, only 12/100 small effect size experiments were successful, and only 40/100 medium effect size experiments were successful. Looks like we definitely want at least n=40 per condition.)

Bayesian power analysis

George

2020-05-27

Generate Data

Run Simulations

Power