Introduction
This tutorial illustrates how to use the open-access R package
BayesSSD for Bayesian sample size determination (SSD; see
Lösener & Moerbeek, 2025 for a description of an earlier version of
this software). The starting point is the data of an existing empirical
study, which informs the SSD for our (fictional) longitudinal experiment
with three treatment conditions and expected attrition. Our goal is to
replicate a previous longitudinal experiment using Bayesian hypothesis
evaluation and to make an informed choice about the number of
individuals in the sample. Note that the model estimation is
maximum likelihood, while the hypothesis about the frequentist estimates
is evaluated in a Bayesian way. We would like to achieve a Bayesian
power level of at least \(\eta = .8\),
given that our design and expected attrition is identical to the
previous study.
The data come from the study “How Can We Increase Privacy Protection Behavior? A Longitudinal Experiment Testing Three Intervention Strategies” by Boerman, Strycharz and Smit (2023). The data are fully anonymous and openly available on the Open Science Framework (OSF). Note that for illustrative purposes, we only focus on a subset of treatment conditions and outcome variables that were investigated in the original study.
Each participant was measured three times (at baseline, two weeks
after, and two months after) in this longitudinal experiment. The
participants were randomly allocated to four strategy training
conditions: a) no strategy (control), b) addressing the
severity of the threat, c) acknowledging and combating privacy
fatigue, and d) a combination of b) and c). The dependent
variable was perceived severity of the threat of infringement of
privacy. Prior research suggests that in adult samples, these types
of interventions actually lead to a decrease in the perceived severity
(Strycharz et al., 2019 & 2021); an effect we aim to replicate.
Thus, our research hypothesis states that all three interventions (b, c
and d) lead to a steeper decline in perceived threat compared to the
control condition a). Furthermore, we hypothesize that the decline is
steepest in condition d), followed by condition c), in turn followed by
condition b). Hence, the research hypothesis can be parametrized as:
\(\mathscr{H}_1:\beta_a > \beta_b >
\beta_c > \beta_d\), where \(\beta\) refers to the coefficient of
interaction between time and the respective
condition in the multilevel regression model. The research
hypothesis will be evaluated against its complement \(\mathscr{H}_c\), which covers all parameter
orderings left uncovered by \(\mathscr{H}_1\). Note that the research
hypothesis is constructed purely for illustrative purposes and may not
reflect the author’s content-related expectations nor be of any
conceptual value.
The steps of this tutorial are as follows: First, we load and clean
the data. Next, we inspect some plots in order to determine that fitting
a multilevel model to this data is indeed sensible. After that, we fit
the multilevel model. Next, we extract the estimates of effects sizes
along with the (co-)variance components necessary to inform our sample
size determination (SSD) procedure. Then, we specify the pattern of
attrition we expect in our replication study based on the reported
attrition rates in Boerman et al. (2023). Finally, we execute the SSD
using the BayesSSD package.
Data preparation
Load data
First, we load the data. These are openly available on OSF and are fully anonymous. Note that to be able to execute the code below, the data must be stored in the same location as the working directory of the R session.
Subset data, convert from wide to long
Next, we have to convert the data frame into the long format in order
to be able to fit a multilevel regression model using the
lme4 package. We also convert some variable types to make
them more suitable for model fitting. To do this, we load the
dplyr and the tidyr package for more
convenient data restructuring.
We first create a data frame with only those variables relevant to
the analysis. Also, we will only be considering participants in
conditions a), b), c) and d) (coded as 0-3). We also need to rename the
values in the condition variable.
In the variable condition, we see that always the last
value for each participant in missing, so we need to fill those in. As
participants did not switch conditions, this will simply be equal to the
previous value for condition within that person.
# install packages if needed
if (!require("dplyr")) {install.packages("dplyr")}
if (!require("tidyr")) {install.packages("tidyr")}
# call their libraries
library(dplyr)
library(tidyr)
# select only relevant variables
relevant_vars <- c("severity_1_W1", "severity_2_W1", "severity_3_W1",
"severity_1_W2", "severity_2_W2", "severity_3_W2",
"severity_1_W3", "severity_2_W3", "severity_3_W3",
"conditie_W1", "conditie_W2")
dat_raw_subset <- dat_raw[,relevant_vars] # subset dataset
# convert from wide to long format
dat_raw_long <- dat_raw_subset %>%
mutate(id = row_number()) %>%
pivot_longer(
cols = -id,
names_to = c(".value", "wave"),
names_pattern = "(.*)_W(\\d+)"
)
# take mean to construct total severity variable
dat_raw_long <- dat_raw_long %>%
mutate(severity = rowMeans(select(., severity_1, severity_2, severity_3), na.rm = TRUE))
# rename variables for clarity
names(dat_raw_long)[names(dat_raw_long) == "conditie"] <- "condition"
# fill in missing condition values for last wave and only select individuals in conditions 0-3
dat_long <- dat_raw_long %>%
group_by(id) %>%
fill(condition, .direction = "updown") %>%
ungroup() %>%
filter(condition == 0 |
condition == 1 |
condition == 3 |
condition == 5)
# rename conditions to a), b), c), d)
dat_long$condition[dat_long$condition==0] <- "a"
dat_long$condition[dat_long$condition==1] <- "b"
dat_long$condition[dat_long$condition==3] <- "c"
dat_long$condition[dat_long$condition==5] <- "d"Add time variable
As the last step in the data cleaning process, we need to recode the
wave variable into a new time variable with
the actual time points (0, 2, 8). We inspect the first couple of rows of
the new data frame to make sure it is formatted correctly.
# if wave is one, recode to 0, if wave is 2, it stays the same, if wave is 3, recode to 8
dat <- dat_long %>%
mutate(time = ifelse(wave == 1, 0, ifelse(wave == 3, 8, wave)))
dat$time <- as.numeric(dat$time) # make time numeric
head(dat) # inspect first couple of rows## # A tibble: 6 × 8
## id wave severity_1 severity_2 severity_3 condition severity time
## <int> <chr> <int> <dbl> <int> <chr> <dbl> <dbl>
## 1 1 1 7 7 7 d 7 0
## 2 1 2 7 7 7 d 7 2
## 3 1 3 7 7 7 d 7 8
## 4 6 1 7 7 7 d 7 0
## 5 6 2 6 6.33 7 d 6.44 2
## 6 6 3 6 6.17 7 d 6.39 8
Now, our dataset is ready for model fitting. Before we fit the model, however, we first make some plots to show that the data indeed has a multilevel structure.
Plotting multilevel structure
We inspect the data via lineplots using the ggplot2
package. First, we illustrate the multilevel structure of the data on a
random subset of 25 participants, as showing all 1000 participant’s
lines would not be practical We thus plot the severity of
each of these 25 individuals over time separately in a facet plot. The
black solid line shows the raw values per point in time while the dotted
red line represents the linear trend of that person. Note that some
individuals have missing values due to dropping out.
# install if needed
if (!require("ggplot2")) {install.packages("ggplot2")}
# randomly select 25 individuals to showcase
set.seed(123) # for reproducibility
sample_ids <- sample(unique(dat$id), 25)
# filter the data to only include those IDs
dat_sample <- dat %>%
filter(id %in% sample_ids)
# create the faceted plot
ggplot2::ggplot(dat_sample, aes(x = time, y = severity)) +
geom_point() + # add points for the actual observations
geom_line() + # connect the points with lines
geom_smooth(method = "lm", se = FALSE, color = "red", linetype = "dashed") + # add individual LM fit
facet_wrap(~ id, ncol = 5) + # create a 4x4 grid of panels
labs(title = "Linear trends over time of a random sample of 25 individuals") +
theme_minimal()We can see that there is considerable variability in both the initial scores and the linear trend over time between individuals. Some individuals start with a higher score, others with a lower score. Some individuals show a positive trend, others a negative one, and yet others do not change over time at all with respect to the outcome. Hence, a multilevel regression model with random intercepts and slopes seems like an appropriate choice.
Multilevel regression model
In this step, we fit the multilevel regression model to the data
using the lme4 package. We do this in order to obtain
estimates necessary to run the SSD algorithm. Alternatively, if no data
are available, these estimates can be obtained by consulting experts in
the respective field of research.
Fit the model
First, we fit the multilevel regression model to the data. The
predictors are time, condition and their
interaction time:condition. Intercepts and slopes for time
are allowed to vary per person as indicated by the random part of the
regression equation(1 + time | id).
# install if needed
if (!require("lme4")) {install.packages("lme4")}
# call the library
library(lme4)
# model for severity with condition and time as well as their interaction as predictors
mlm <- lmer(severity ~ -1 + condition + time:condition + (1 + time | id), dat, REML = F)
summ_mlm <- summary(mlm)
# inspect model output
summ_mlm## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: severity ~ -1 + condition + time:condition + (1 + time | id)
## Data: dat
##
## AIC BIC logLik -2*log(L) df.resid
## 3200.4 3260.6 -1588.2 3176.4 1102
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.2596 -0.4207 0.0877 0.4384 4.0952
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## id (Intercept) 1.269526 1.12673
## time 0.002934 0.05417 -0.54
## Residual 0.447243 0.66876
## Number of obs: 1114, groups: id, 432
##
## Fixed effects:
## Estimate Std. Error t value
## conditiona 5.570233 0.120518 46.219
## conditionb 5.651525 0.119346 47.354
## conditionc 5.378745 0.121639 44.219
## conditiond 5.391863 0.120557 44.725
## conditiona:time 0.015541 0.015198 1.023
## conditionb:time 0.006049 0.014274 0.424
## conditionc:time -0.006859 0.015226 -0.450
## conditiond:time -0.021731 0.015617 -1.391
##
## Correlation of Fixed Effects:
## condtn cndtnb cndtnc cndtnd cndtn:t cndtnb: cndtnc:
## conditionb 0.000
## conditionc 0.000 0.000
## conditiond 0.000 0.000 0.000
## conditin:tm -0.424 0.000 0.000 0.000
## conditnb:tm 0.000 -0.436 0.000 0.000 0.000
## conditnc:tm 0.000 0.000 -0.426 0.000 0.000 0.000
## conditnd:tm 0.000 0.000 0.000 -0.419 0.000 0.000 0.000
From the model output, we gather that in the first condition a), the
perceived severity increases slightly over time, as indicated by the
estimate for conditiona:time which is equal to 0.016. We
also see that in the conditions b), c) and d), the change over time is
more negative compared to condition a), as indicated by the coefficients
for conditionb:time, conditionc:time and
conditiond:time, which are equal to 0.006, -0.007 and
-0.022, respectively.
Extract estimates
Next, we extract all the (co-)variance components needed for the SSD. Specifically, the intercept variance, the slope variance, their covariance, and the residual variance.
Calculate effect sizes
The effect size found in the previous study serves as an estimate for the effect size we expect to find. It is defined as the ratio between the regression coefficient and the square root of the slope variance \(\delta = \frac{\beta}{\sqrt{\sigma^2_{u1}}}\) (Raudenbush & Liu, 2001). This means that we can derive the effect sizes from the regression coefficients and the slope variance as follows:
# extract the coefficients of interaction between time and condition
beta_cond_a <- mlm@beta[5]
beta_cond_b <- mlm@beta[6]
beta_cond_c <- mlm@beta[7]
beta_cond_d <- mlm@beta[8]
# calculate effect sizes for each treatment condition according to Raudenbush & Liu (2001)
(delta_cond_a <- beta_cond_a / sqrt(var_slope))## [1] 0.2869102
## [1] 0.111666
## [1] -0.1266238
## [1] -0.4011699
We can see that the effect sizes for treatments a), b), c), and d) are equal to \(\delta_a=0.29\), \(\delta_b=0.11\), \(\delta_c=-0.13\), and \(\delta_d=-.40\), respectively. Positive (negative) effect sizes indicate an increase (decline) of perceived severity over time. For an illustrative overview of the influence of the multilevel model parameters on the growth curves, please visit the ShinyApp developed for this purpose.
Time points and attrition
The \(n=3\) timepoints are at baseline (\(t_1=0\)), two weeks later (\(t_2=2\)) and two months later (\(t_3=8\)), with \(t\) denoting the number of weeks since the start of the study. At each of the three measurement occasions, there are \(N_1=1000\), \(N_2=799\) and \(N_3=465\) individuals remaining in the sample (Boerman et al., 2024). This corresponds to a remaining proportion of .799 at \(t_2\) and of .465 at \(t_2\).
The plot below shows the proportion of dropout at each measurement occasion.
# install scales package if needed
if (!require("scales")) {install.packages("scales")}
# create data of timepoints and remaining individuals
timepoints <- c(0, 2, 8)
proportions <- c(1, 0.799, 0.465)
# create a data frame in long format for plotting
df <- data.frame(
x = rep(factor(timepoints), each = 2),
group = rep(c("remaining", "dropped out"), times = length(timepoints)),
value = c(rbind(proportions, 1 - proportions)))
# plot
ggplot(df, aes(x = x, y = value, fill = group)) +
geom_col(position = "stack") +
scale_y_continuous(labels = scales::percent, limits = c(0, 1)) +
labs(
x = "Time elapsed in weeks",
y = "Proportion of individuals remaining",
fill = "Status"
) +
theme_minimal()For an extensive overview of the available survival functions that
can be used for modelling attrition in the BayesSSD
package, please visit the ShinyApp
developed for this purpose.
Using the BayesSSD package
Package installation
Now that we obtained all the estimates necessary to perform the
Bayesian Sample Size Determination, we need to install the package
BayesSSD from GitHub using the
pak package.
Arguments to the SSD_longit function
The following table provides an overview of all the arguments to the
function SSD_longit and their default values.
| Argument | Type | Description | Default |
|---|---|---|---|
| eta | numeric | the desired power level | 0.8 |
| attrition | string | if FALSE, no attrition is simulated. Otherwise, input determines the survival function(s) used. | ‘weibull’ |
| params | list/vector | parameter values passed to the survival function. Single vector if the same attrition pattern applies to all treatment conditions. Otherwise list of vectors for each pattern. First entry is omega, second gamma and third (if applicable) kappa. | c(0.5, 1) |
| BFthres | numeric | The threshold a BF must exceed in order to be considered convincing evidence | 3 |
| PMPthres | numeric | The threshold a PMP must exceed in order to be considered convincing evidence | 0.9 |
| eff.sizes | numeric | delta, the expected standardized effect size(s) | c(0, 0.5, 0.8) |
| m | numeric | the number of datasets to be simulated in each iteration | 1000 |
| t.points | vector | the time points of the measurement occasions, equal and unequal spacing possible | c(0,1,2,3,4) |
| log | logical | use log-linear growth? | FALSE |
| var.u0 | numeric | the intercept variance | 0.03 |
| var.u1 | numeric | the slope variance | 0.1 |
| var.e | numeric | the residual variance | 0.02 |
| cov | numeric | the covariance between intercept and slope variance | 0 |
| sensitivity | logical | execute sensitivity analysis for different values (1, 2, 3) for fraction? | FALSE |
| hypothesis | string | Either a single hypothesis or a list of hypotheses with the research hypothesis of interest in first place. Treatment conditions are referred to as ‘a, b, c, …,’ | ‘a<b<c’ |
| method | string | The method of hypothesis evaluating: ‘bfc’ refers to the BF versus the complement hypothesis, ‘bf’ to the BF of the first versus the second hypothesis in ‘hypothesis’ and ‘pmp’ to the posterior model probability in the set of all hypotheses including the complement hypothesis. | ‘bfc’ |
| N.min | numeric | The minimal sample size to be considered | 30 |
| N.max | numeric | The maximal sample size to be considered | 1000 |
| group.sizes | vector | NULL for a balanced design. When group sizes differ, a vector with the proportions of the total sample size in each treatment condition with the first element corresponding to condition ‘a’, the second to ‘b’, etc. | NULL |
| tol | numeric | tolerance of deviation from desired power. Higher values may speed up performance. | 0.001 |
| seed | numeric | set a seed for reproducibility | NULL |
Performing Bayesian SSD
Now, we can run the simulation-based Bayesian SSD with the
BayesSSD command and pass all the ingredients to the
appropriate arguments. Remember that we strive for a power level of
\(\eta=.8\) and our research hypothesis
is \(\mathscr{H}_1:\beta_a>\beta_b>\beta_c>\beta_d\).
We are planning on evaluating this hypothesis against the complement
hypothesis \(\mathscr{H}_c\) which
covers all parameter orderings not included in our research hypothesis.
Because we are planning on running a exploratory rather than a
confirmatory study, we set the threshold for a convincing Bayes Factor
\(BF_{1c}\) to 3 in order not to miss
any small but relevant effects.
We search for the appropriate sample size in a range from \(N_{min}=100\) and \(N_{max}=1500\) because we know that the
original study used \(N=1000\)
participants. For reliable results, we set m=10000, meaning
that 10,000 datasets are simulated and evaluated in each iteration.
Increasing m increases the precision as well as the
computing time of the algorithm. We also set a seed for reproducibility,
as some parts of the simulation rely on stochastic processes. For more
information on the arguments of the SSD_longit function,
run ?SSD_longit or see the table above.
ssd_result <- SSD_longit(
eta = 0.8,
hypothesis = "a > b > c > d",
attrition = "nonparametric",
params = c(1, 0.799, 0.465),
m = 10000,
t.points = c(0,2,8),
var.u0 = var_intercept,
var.u1 = var_slope,
var.e = var_resid,
cov = cov_int_slope,
eff.sizes = c(delta_cond_a, delta_cond_b, delta_cond_c, delta_cond_d),
BFthres = 3,
method = "bfc",
sensitivity = F,
seed = 123,
group.sizes = NULL,
N.min = 100,
N.max = 1500
)## | | | 0% | |========= | 12% | |================== | 25% | |========================== | 38% | |=================================== | 50% | |======================================================================| 100%
##
## Final sample size
## =================
## Hypotheses:
## H1: a > b > c > d
## Hc: not H1
## Using a total of N = 1062 Participants
## Power = P(BF_1c > 3 | H1) = 0.805
The output informs us that the minimal sample size needed to achieve a power of \(\eta=.80\) when testing \(\mathscr{H}_1\) against its complement \(\mathscr{H}_c\) is \(\mathbf{N_{total}=1062}\), while taking the anticipated attrition pattern into account. So, for our fictitious replication study, we would need to sample 62 more participants than in the original study.
For more information on the SSD procedure, we can inspect the object that the function was assigned to.
## $evaluations
## N power prop_simplified
## 1 800 0.73 0
## 2 1150 0.82 0
## 3 975 0.78 0
## 4 1062 0.80 0
##
## $final
## $final$N
## [1] 1062
##
## $final$power
## [1] 0.805
##
## $final$threshold_BF
## [1] 3
##
## $final$threshold_PMP
## [1] 0.9
##
## $final$sensitivity
## [1] FALSE
##
##
## $hypotheses
## $hypotheses$hypothesis
## [1] "a > b > c > d"
##
## $hypotheses$comparison
## [1] "bfc"
##
##
## $iterations
## [1] 5
##
## $runtime
## [1] 34
This tells us the sample sizes that were evaluated during the binary search (800, 1150, 975, 1062) along with their resulting power, the proportion of models that had to be simplified due to high attrition (0%), the number of iterations (5) and the runtime of the SSD procedure in minutes (35). While this information is not essential for the result of the SSD, it can be useful for gaining more insight into the SSD algorithm.
References
Boerman, S. C., Strycharz, J., & Smit, E. G. (2024). How can we increase privacy protection behavior? A longitudinal experiment testing three intervention strategies. Communication research, 51(2), 115-145.
Lösener, U., & Moerbeek, M. (2025). Bayesian sample size determination for longitudinal intervention studies with linear and log-linear growth. Behavior Research Methods, 57(9), 239.
Raudenbush, S. W., & Liu, X. F. (2001). Effects of study duration, frequency of observation, and sample size on power in studies of group differences in polynomial change. Psychological methods, 6(4), 387.
Strycharz J., Smit E. G., Helberger N., van Noort G. (2021). No to cookies: Empowering impact of technical and legal knowledge on rejecting tracking cookies. Computers in Human Behavior, 120, 106750. https://doi.org/10.1016/j.chb.2021.106750
Strycharz J., Van Noort G., Smit E. G., Helberger N. (2019). Protective behavior against personalized ads: Motivation to turn personalization off. Cyberpsychology: Journal of Psychosocial Research on Cyberspace, 13(2), 1. https://doi.org/10.5817/CP2019-2-1
If you have questions or comments about this Markdown document or
about the BayesSSD Package, please contact me under u.c.losener@uu.nl.