For 90% power on detecting both the condition difference (skewed vs not skewed) and condition x age (4-8yo) interaction defined below, we would need a total sample size of approximately 600 children (300 per condition).
# set study parameters
age_min <- 4.00
age_max <- 8.99
slope_age_same <- 0.10
slope_age_pop <- 0.04
# function to return expected probabilities of responses
get_p_responses <- function(age, condition){
if(condition == "not skewed"){
# even starting points at 4yo
p_same <- 4/10 + (age - 4)*slope_age_same
# 26% (4/15) same at 4yo
# p_same <- 4/15 + (age - 4)*slope_age_same
# 40% same at 4yo
# p_same <- 1/3 + (age - 4)*slope_age_same
remaining <- 1 - p_same
p_pop <- remaining/2
p_sample <- remaining/2
}
if(condition == "skewed") {
# even starting points at 4yo
# p_pop <- 1/3 + (age - 4)*slope_age_pop
# remaining <- 1 - p_pop
# p_sample <- remaining/2 - (age - 4)*0.8*slope_age_pop
# p_same <- remaining/2 + (age - 4)*0.8*slope_age_pop
# 26% (4/15) same at 4yo
# p_pop <- 11/30 + (age - 4)*slope_age_pop
# remaining <- 1 - p_pop
# p_same <- remaining*(8/19) + (age - 4)*0.8*slope_age_pop
# p_sample <- remaining*(1-8/19) - (age - 4)*0.8*slope_age_pop
# 40% same at 4yo
p_pop <- 0.3 + (age - 4)*slope_age_pop
remaining <- 1 - p_pop
p_same <- remaining*(4/7) + (age - 4)*0.8*slope_age_pop
p_sample <- remaining*(3/7) - (age - 4)*0.8*slope_age_pop
}
return(c(p_pop, p_sample, p_same))
}
Hypothesized developmental starting point at 4yo is based on chance performance in both conditions.
Hypothesized developmental end point at 9yo is based on adult responses in Study 1b to an explicit comparison question with different wording in not skewed vs skewed conditions.
We can find what sample sizes yield what statistical power on the target effect(s) by simulating data under the target effect defined above.
We test a variety of possible total sample sizes. For each sample size, we run 500 simulations of that sample size.
For each simulation, we simulate data of the specified sample size by sampling randomly across the target age range (uniform), evenly assigning to condition (half the sample size in each condition), and sampling using the probabilities defined above for the target effect. Then we can fit the target statistical model (multinomial logistic regression predicting response from age, condition, and their interaction), and extract p-values for the target effects (condition main effect, age x condition interaction).
Collapsing across simulations of the same sample size, we can then calculate what proportion of simulations yielded a significant p-value for a particular effect, which is our power on that effect for that sample size.
# set simulation parameters
set.seed(42) # reproducibility
n_sim <- 1000 # per sample size
sample_sizes_to_test <- c(190, 200, 300, 400, 500, 600) # total sample sizes
alpha <- 0.05 # p-value criterion
# function to run a simulation
run_simulation <- function(sample_size_total){
# simulate data
sim_data <- tibble(
# randomly generate age within bounds
age = runif(sample_size_total, age_min, age_max) %>% round(2),
# randomly assign condition
condition = (c(rep("skewed", sample_size_total/2),
rep("not skewed", sample_size_total/2)))
) %>%
# simulate response for each row
mutate(
probabilities = map2(age, condition, get_p_responses),
response = map_chr(probabilities,
\(x) sample(
c("pop", "sample", "same"),
size = 1,
prob = x
)
))
# fit model
dv_comp_age_condition <-
multinom(response ~ age * condition, data = sim_data)
# get p-values
anova_results <- dv_comp_age_condition %>% Anova()
p_val_condition <- anova_results["condition", "Pr(>Chisq)"]
p_val_age_condition <- anova_results["age:condition", "Pr(>Chisq)"]
return(c(p_val_condition, p_val_age_condition))
}
# set up simulation grid with parameters
sim_grid <-
crossing(
alpha = alpha,
sample_size_total = sample_sizes_to_test,
simulation = rep(1:n_sim, length(sample_sizes_to_test)) # __ simulations per each possible sample size
)
# run simulations for each row
sim_results <- sim_grid %>%
rowwise() %>%
mutate(
p_values = list(run_simulation(sample_size_total)),
p_val_condition = p_values[1],
p_val_age_condition = p_values[2]
) %>%
ungroup() %>%
select(-p_values)
# calculate power
power <- sim_results %>%
group_by(sample_size_total) %>%
summarize(
power_condition = mean(p_val_condition < alpha),
power_age_condition = mean(p_val_age_condition < alpha)
)
# save results
# write_csv(power, "study-1a-children-power-analysis.csv")
| Power on target effects by total sample size | ||
| multinomial logistic regression predicting response from age, condition, and their interaction | ||
| total sample size | power: condition | power: age x condition interaction |
|---|---|---|
| 190 | 0.921 | 0.475 |
| 200 | 0.942 | 0.469 |
| 300 | 0.985 | 0.686 |
| 400 | 0.998 | 0.794 |
| 500 | 1.000 | 0.883 |
| 600 | 1.000 | 0.948 |
## R version 4.4.2 (2024-10-31)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sequoia 15.7.2
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] gt_0.11.1 nnet_7.3-19 lmerTest_3.1-3 lme4_1.1-35.5
## [5] Matrix_1.7-1 car_3.1-3 carData_3.0-5 lubridate_1.9.3
## [9] forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4 purrr_1.0.2
## [13] readr_2.1.5 tidyr_1.3.1 tibble_3.2.1 ggplot2_3.5.1
## [17] tidyverse_2.0.0
##
## loaded via a namespace (and not attached):
## [1] sass_0.4.9 generics_0.1.3 xml2_1.3.6
## [4] stringi_1.8.4 lattice_0.22-6 hms_1.1.3
## [7] digest_0.6.37 magrittr_2.0.3 evaluate_1.0.1
## [10] grid_4.4.2 timechange_0.3.0 fastmap_1.2.0
## [13] jsonlite_1.8.9 ggthemes_5.1.0 Formula_1.2-5
## [16] scales_1.3.0 textshaping_0.4.0 numDeriv_2016.8-1.1
## [19] jquerylib_0.1.4 abind_1.4-8 cli_3.6.3
## [22] rlang_1.1.4 munsell_0.5.1 splines_4.4.2
## [25] withr_3.0.2 cachem_1.1.0 yaml_2.3.10
## [28] tools_4.4.2 tzdb_0.4.0 nloptr_2.1.1
## [31] minqa_1.2.8 colorspace_2.1-1 boot_1.3-31
## [34] vctrs_0.6.5 R6_2.5.1 lifecycle_1.0.4
## [37] MASS_7.3-61 ragg_1.3.2 pkgconfig_2.0.3
## [40] pillar_1.10.0 bslib_0.8.0 gtable_0.3.5
## [43] Rcpp_1.0.13 glue_1.8.0 systemfonts_1.1.0
## [46] xfun_0.49 tidyselect_1.2.1 rstudioapi_0.17.1
## [49] knitr_1.49 farver_2.1.2 htmltools_0.5.8.1
## [52] nlme_3.1-166 labeling_0.4.3 rmarkdown_2.29
## [55] compiler_4.4.2