To obtain a sensible estimate of the sample size required to achieve a given power, we will need to
The input needed can be summarised as
Also, we may want to consider missing data expectations - magnitude and mechanism.
The research questions concern
Previous year intake for the two sites available for sampling was 300 (East) and 700 (West), this should constitute a reasonable ceiling for the maximum possible sample size.
Several continuous outcomes, possibly to analyse in a multivariate fashion (if feasible at all). Longitudinal study over three years, 12 time points in two patterns from 0 to 24 months for each cohort:
Two groups, experimental and control, and a post-intervention one year follow-up.
We have several predictors:
also, a background variable consisting of 5 constructs exist (maybe too much as a predictor, maybe too important? Think about this).
No closed form solution to the problem, need to resort to simulations to construct power curves. Computationally intensive and requires a lot of prior knowledge/educated guesses on the distributions of both outcomes and predictors.
There are three possibilities to set up the simulation study (ordered from the most to the least desirable situation):
Currently, we are sort of in point 4. I have found literature on CRP, tumor necrosis factor alpha, interleukin-1b and -6, and obtained a wealth of numbers I could use, but: most of the levels refer to populations that are very different from the target one - older individuals, usually in their 50s and with some kind of medical condition that warranted the analyses in the first place, plus they are asian, european, hell even aboriginal.. but no black no mixed. Moreover, I found some literature on the fact that CRP production may be stimulated by the other three factors, leading to a somewhat more complex structure than anticipated.
As things are, and given that I don’t think we will be able to run a pilot before budgeting, I see three ways out of this impasse (again, in order of preference):
The third option, ‘go full conservative’ seems to be the only viable. However, given budget constraints, it is desirable to find some indication that the expected maximum possible sample size (1000) could be reduced without losing too much power.
According to the study design, at the beginning of each of three years (2019, 2020, 2021), a sample is to be collected from the intake of the W4C centers (T, treatment), together with sample from the population (C, control) that matches T on known covariates such as age, gender, race, setting (rural/urban), area (western/eastern) and background of adversity. The experimental surf therapy (ST) lasts one year, after which the T groups from 2019 and 2020 will be followed-up for an additional year (red lines).
The case of support document highlights the lack of literature on the outcomes of interest for the population at hand (young, black/mixed/white south african with an history of adversity). Moreover, I could find no explicit mention to how the scientific questions would translate in terms of hypothesis on said outcomes. This makes a proper and fully reliable power calculation impossible to carry out at the time.
Given the structure of the design, however, it is reasonable to assume that once the 2019 T and C groups have been sampled, the baseline information will allow to provide a more accurate estimate of the sample size required to achieve a prescribed power in testing the existence of a treatment effect. Moreover, the accuracy of this estimate may be greatly improven once the observations for at least a few time points have been collected, which would help refining our understanding of the relationships between the involved quantities. In practice, this translates to ‘we will be likely able to reduce the sample size needed to achieve the desired power with the second and third group, therefore it may be reasonable to deflate an initial, necessarily rough estimate, of the sample size’. The magnitude to which this may be reasonable, however, is entirely up to conjecture.
Until that point, the best I can do with in the little time available is focus on a single outcome and use all the available information to obtain a sample size. I’ve decided to pick C-reactive protein (CRP), mostly because I’ve managed to find some literature describing its distribution in populations similar to the one under study and because it seemed reasonable to assume that surf therapy, being a combinatin of cognitive behavioural therapy and phisical activity, could contribute to reduce the leves of CRP over time, which provides me with a working hypothesis to test.
The available literature suggests [12881452, 26033244, 15205215] that the distribution of concentration of CRP in saliva in the general population:
Values of CRP concentration (in mg/L) higher than 3 are typically considered high, and can indicate existence of systemic inflammation. I decided to describe the distribution of CRP concentration using a Log-Normal distribution, and made the following assumptions:
I use a linear mixed regression approach to model the natural logarithm of CRP concentration. The model includes an intercept, an individual-specific intercept (random effect), and an interaction term of time (in months) and a treatment variable (1 if in T, 0 otherwise).
The hypotesis to be tested is that there exist and interaction between time and treatment, specifically we postulate that the differences between T and C can be described by the effect size of the treatment effect per time unit on the log-scale (i.e., the mean on the log-scale in the C group is constant, whereas it decreases linearly with time in the T group). For interpretability, the considered effect sizes are eventually presented in terms of percentage changes in the (geometric) mean of the original CRP concentration, per month, for those in the treatment group.
The plot below presents the power curves, varying effect sizes, as a function of sample size in each group (meaning that the total sample size will be twice that one), the significance level being \(\alpha=0.01\), at each point \(99\%\) asymptotic confidence intervals are drawn around the Monte Carlo estimate (1000 runs). Interpretation: for a %Effect size of, e.g., \(-0.5\%\), the curve describes the power to detect a maximum difference of \(12*(-0.5)\% = -6\%\) at the end of the 12 months. The red lines indicate the usual thresholds of \(80\%\) and \(90\%\) power, for better readability.
The use of the simulation results presented in the plot obviously needs to consider the effect size we desire to detect, whose magnitude is currently unknown to me for CRP.
Very briefly:
Conservative solution: based on these results, we would need approximately 250 individuals per group (total 500) in the first wave to obtain a reasonable power (around \(80%\)) in detecting approximately \(5%\) decrease in CRP g-mean levels at the end of the 12 months (this magnitude is absolutely arbitrary).If we decide to revise our power calculation at baseline, however, this could be a conservative approach, so that we could decide to sample a bit less of those 500 (again, arbitrary effect size). How many less needs to be discussed based on the desired effect size.