This tutorial is available on RPubs
Estimating the sample size for a prospective study is one of the first things that researchers do prior to enrolling patients. Doing this exercise helps the researcher to determine if they could feasibly enroll the necessary number of patients to answer the research question they developed. If a study’s estimated sample size was over a million patients, it would be a very daunting project to undertake. Knowing this information at the beginning of a study design helps to calibrate and adjust expectations to more reasonable levels.
In this tutorial, you will learn how to use R to:
In
this course, we will use R
to estimate sample size and
perform power analysis. Alternatively, you can also download and use G*Power
to perform these exercises.
If you are performing a prospective study where you need to enroll patients, estimating the optimal number of patients is necessary for your Statistical Analysis Plan.
If you are performing a retrospective study, you can determine how much power you have to detect a difference if one existed. This is not commonly done unless you have a non-significant value and want to determine if you have a high potential of type II error due to a small sample.
Before enrolling patients, you want to make sure that you estimate the optimal number of patients you’ll need for your study. It would not be efficient if you enrolled everyone into your study because that would be costly and unethical.
This is a common sample size calculation when you have two groups and the outcome is dichotomous (e.g., Yes / No)
Let’s assume that you were asked to estimate the number of patients that would be needed for an upcoming randomized controlled trial. The study’s primary aim is to estimate the efficacy of Treatment A versus Treatment B. The main end point is a dichotomous variable: “Response” or “No response.”
The null hypothesis is:
\(H_{0}\): There is no difference in the proportion of responders between Treatment A and Treatment B
The alternative hypothesis is:
\(H_{a}\): There is a difference in the proportion of responders between Treatment A and Treatment B
To determine the sample size needed to detect a meaningful difference in the proportion of responders between Treatment A and Treatment B, we need the following information:
The alpha level is the threshold where you make the determination whether something is statistically significantly different. We normally use an alpha level of 0.05 (type I error). This is the convention for most biomedical sciences, but it can vary depending on the situation (e.g., multiple comparisons).
Effect size is the standardized difference one would expect to see between treatment groups. This is an estimate and something that is usually based on past studies. Often times, researchers have to make an educated guess what the effect size is.
Power (1 - \(\beta\)) is the ability to correctly reject the null hypothesis if the actual effect of the population is equal to or greater than the specified effect size. (Recall that type II error is \(\beta\).) In other words, if you conclude that there is no difference in the sample, then there is no true difference in the population conditioned on the specified effect size
We’ll set the two-tailed alpha level to 0.05.
We can estimate size \(h\) using the following equation:
\(h = \varphi_{1} - \varphi_{2}\),
where \(\varphi_{i} = 2 * arcsine(\sqrt{p_{i}})\)
where \(p_{i}\) denotes the proportion in treatment \(i\) that were classified as “responders.”
We generally set the power to 80%.
With these pieces of information, we can estimate the sample size for a study with two proportions as the outcome using the following formula:
With R
you
will not need to use this formula. However, if you decide to estimate
sample size by hand, you need to make sure to use the correct \(Z_{\alpha/2}\) = 1.96 and \(Z_{(1-\beta)}\) =
0.84.
\(\begin{aligned} n_{i} = \frac{(Z_{\alpha/2} + Z_{1 - \beta})^2 * (p_{1}(1 - p_{1})) + (p_{2}(1 - p_{2}))}{(p_{1} - p_{2})^2} \end{aligned}\)
where \(n_{i}\) is the sample size for one group.
Fortunately, we don’t have to do all this work. We can estimate the
sample size using the R package pwr
to do the heavy
lifting.
Make sure you have the package installed by typing
install.packages("pwr")
.
Once installed, make sure to load the package by typing
library("pwr")
.
Use the pwr.2p.test()
function to find out how many
patients are needed. We’re going to need a couple of pieces of
information. We already explained that you need the alpha level
(alpha = 0.05
), effect size \(h\), and power level
(power = 80%
).
We also need the \(p_{i}\) for Treatment A and Treatment B. This is something that we have to guess. We make this guess by looking at past studies and determined the normal rate of response. Sometimes you can’t find this in the literature, so you have to make an educated guess. I usually start with 50% because this is a nice neutral point; it’s like a coin flip (50-50). I’ll assign Treatment B as the reference (e.g., control) with 50% response rate. Now, I’ll “guess” that Treatment A is slightly better with a response rate of 60%. Therefore, I have:
\(p_{1} = 0.60\)
\(p_{2} = 0.50\)
pwr.2p.test
makes estimating \(h\) easy.
All you need to do is enter values for \(p_{1}\) and \(p_{2}\)
into the ES.h(p1, p2)
function.
With these pieces of information, I can estimate the sample size required to detect a difference in “response” rate between Treatment A and Treatment B that is 10% (\(p_{1}\) - \(p_{2}\)) with an alpha of 5% and power of 80%.
### alpha = sig.level option and is equal to 0.05
### power = 0.80
### p1 = 0.60
### p2 = 0.50
power1 <-pwr.2p.test(h = ES.h(p1 = 0.60, p2 = 0.50), sig.level = 0.05, power = .80)
power1
##
## Difference of proportion power calculation for binomial distribution (arcsine transformation)
##
## h = 0.2013579
## n = 387.1677
## sig.level = 0.05
## power = 0.8
## alternative = two.sided
##
## NOTE: same sample sizes
The effect size h
is 0.201, and the sample
n
is 387.2 or 388 rounded to the nearest whole number. This
n
is only for one group. We assume that this is 1 to 1
ratio. Hence, based on the parameters of our study, we need
approximately 388 patients in Treatment A and 388 patients in Treatment
B to detect a difference of 10% response or greater with an alpha of
0.05 and power of 80%.
Power (1 - \(\beta\)) is the ability to correctly reject the null hypothesis if the actual effect of the population is equal to or greater than the specified effect size. In other words, if you conclude that there is no difference in the sample, then there is no true difference in the population conditioned on the specified effect size
We can also use the plot feature to see how the power level changes with varying sample sizes. As the sample size goes up, power increases. As the sample size goes down, power decreases. This is important to understand. As we increase our sample size, we reduce the uncertainty around the estimates. By reducing this uncertainty, we gain greater precision in our estimates, which results in greater confidence in our ability to avoid making a type II error.
### We can plot the power relative to different levels of the sample size.
plot(power1)
Let’s change the \(p_{i}\) and see how the power level change; we are fixing our sample size at 388 for each group with an alpha of 0.05. We create a sequence of values by varying the proportion of “responders” for Treatment A. We will change these from 50% to 100% in intervals of 5%.
p1 <- seq(0.5, 1.0, 0.05)
power1 <-pwr.2p.test(h = ES.h(p1 = p1, p2 = 0.50),
n = 388,
sig.level = 0.05)
powerchange <- data.frame(p1, power = power1$power * 100)
plot(powerchange$p1,
powerchange$power,
type = "b",
xlab = "Proportion of Responders in Treatment A",
ylab = "Power (%)")
We can also write a function for this:
Learning to write
functions can make programming much easier. In this example, I wrote a
function called iteration()
that will allow me to input
parameters to generate the power level iterated at different values of
p1
.
iteration <- function(p_i, P_i, i_i, p2, n, alpha) {
p1 <- seq(p_i, P_i, i_i)
power1 <-pwr.2p.test(h = ES.h(p1 = p1, p2 = p2),
n = n,
sig.level = alpha)
powerchange <- data.frame(p1, power = power1$power * 100)
powerchange
}
iteration(0.5, 1.0, 0.05, 0.50, 388, 0.05)
## p1 power
## 1 0.50 5.00000
## 2 0.55 28.65038
## 3 0.60 80.08415
## 4 0.65 98.88117
## 5 0.70 99.99190
## 6 0.75 100.00000
## 7 0.80 100.00000
## 8 0.85 100.00000
## 9 0.90 100.00000
## 10 0.95 100.00000
## 11 1.00 100.00000
This is another common sample size estimation for two groups when the outcome is a continuous variable. Now let’s estimate the sample size for a study where we are comparing the averages between two groups. Let’s suppose that we are working on a randomized controlled trial that seeks to evaluate the difference in the average change in hemogloblin A1c (HbA1c) from baseline between Treatment A and Treatment B.
The null hypothesis is:
\(H_{0}\): There is no difference in the average change in HbA1c from baseline between Treatment A and Treatment B
The alternative hypothesis is:
\(H_{a}\): There is a difference in the average change in HbA1c from baseline between Treatment A and Treatment B
To determine the sample size needed to detect a meaningful difference in the average HbA1c change from baseline between Treatment A and Treatment B, we can use the following formula:
With R
you
will not need to use this formula. However, if you decide to estimate
sample size by hand, you need to make sure to use the correct \(Z_{\alpha/2}\) = 1.96 and \(Z_{(1-\beta)}\) =
0.84.
\(\begin{aligned} n_{i} = \frac{2 \sigma^2 * (Z_{\alpha/2} + Z_{1 - \beta})^2}{({\mu_{1} - \mu_{2}})^2} \end{aligned}\)
where \(n_{i}\) is the sample size for one of the groups,
\(\mu_{1}\) and \(\mu_{2}\) are the average changes in HbA1c from baseline for Treatment A and Treatment B, respectively
We also need the following information:
We’ll set the two-tailed alpha level to 0.05.
There are other
formulas that provide standardized measures such as Hedges \(g\).
However, it is not always clear what the standardize effect size means.
The interpretation of effect size varies, but common convention
recommends the following:
\(d\) = 0.2 (small effect)
\(d\) = 0.5
(medium effect)
\(d\) = 0.8 (large
effect)
The effect size, also known as Cohen’s \(d\), is estimated using the following equation:
\(\begin{aligned} d = \frac{\mu_{1} - \mu_{2}}{\sigma_{pooled}} \end{aligned}\)
The pooled standard deviation is estimated using the following formula:
Sometimes it is not easy figuring out the pooled standard deviation (\(\sigma_{pooled}\)). So, you will have to make an educated guess.
\(\begin{aligned} \sigma_{pooled} = \sqrt{\frac{sd_{1}^2 + sd_{2}^2}{2}} \end{aligned}\)
Once again, the R pwr
package can make this task easy
for us. However, we’ll need to estimate the Cohen’s \(d\).
Since we haven’t started the study, we have to make some assumptions of about each treatment strategy’s change in HbA1c. You can do this by reviewing the literature and getting a general sense of what is considered the average change in HbA1c and then make an educated guess. For our example, let’s assume that the expected average change in HbA1c from baseline for Treatment A was 1.5% with a standard deviation of 0.25%. Additionally, let’s assume that the expected average change in HbA1c from baseline for Treatment B was 1.0% with a standard deviation of 0.20.
First, we’ll calculate the pooled standard deviation (\(\sigma_{pooled}\)):
sd1 <- 0.25
sd2 <- 0.30
sd_pooled <- sqrt((sd1^2 +sd2^2) / 2)
sd_pooled
## [1] 0.276134
Once we have the \(\sigma_{pooled}\), we can estimate the Cohen’s \(d\):
mu1 <- 1.5
mu2 <- 1.0
d <- (mu1 - mu2) / sd_pooled
d
## [1] 1.810715
The Cohen’s \(d\) is 1.81, which is considered a large effect size.
Recall that the average change in HbA1c from baseline for Treatment A was 1.5% and 1.0% for Treatment B.
Now, we can take advantage of the pwr
package and
estimate the sample size needed to detect a difference of 0.5% (1.5% -
1.0%) in the average HbA1c change from baseline between Treatment A and
Treatment B with 80% power and a significance threshold of 5%.
We will use the
pwr.t.test()
function to estimate the sample size for a
study where there are two groups and the outcome is a continuous
variable.
### d = Cohen's d
### power = 0.80
### alpha = 0.05
n_i <- pwr.t.test(d = d, power = 0.80, sig.level = 0.05)
n_i
##
## Two-sample t test power calculation
##
## n = 5.921286
## d = 1.810715
## sig.level = 0.05
## power = 0.8
## alternative = two.sided
##
## NOTE: n is number in *each* group
Recall that the average change in HbA1c from baseline for Treatment A was 1.5% and 1.0% for Treatment B. Hence, the difference is 0.5%. Based on our study parameters, we need 6 patients in each group to detect a difference of 0.5% or greater with 80% power and a significance threshold of 5%.
Power (1 - \(\beta\)) is the ability to correctly reject the null hypothesis if the actual effect of the population is equal to or greater than the specified effect size. In other words, if you conclude that there is no difference in the sample, then there is no true difference in the population conditioned on the specified effect size
We can plot how the power will change as the sample size changes. As the sample size increases, power increases. This should make sense. Like our previous example with two proportions, as we increase our sample size, we reduce the uncertainty around the estimates. By reducing this uncertainty, we gain greater precision in our estimates, which results in greater confidence in our ability to avoid making a type II error.
### We can plot the power relative to different levels of the sample size.
n <- seq(1, 10, 1)
nchange <- pwr.t.test(d = d, n = n, sig.level = 0.05)
## Warning in qt(sig.level/tside, nu, lower = FALSE): NaNs produced
nchange.df <- data.frame(n, power = nchange$power * 100)
nchange.df
## n power
## 1 1 NaN
## 2 2 19.03307
## 3 3 39.61785
## 4 4 57.33850
## 5 5 70.87945
## 6 6 80.64997
## 7 7 87.42531
## 8 8 91.98145
## 9 9 94.96979
## 10 10 96.88938
plot(nchange.df$n,
nchange.df$power,
type = "b",
xlab = "Sample size, n",
ylab = "Power (%)")
We don’t use the
plot(n_i)
because it doesn’t do a good job of plotting the
power when the sample size is small.
As the sample size increases, our power increases. This makes sense because we have more patients to detect differences that may be smaller. But we fixed our effect size (Cohen’s \(d\)), so as we increase the sample size, our power to detect that difference ultimately increases.
Let’s change the \(\mu_{i}\) and see how the power level change; we are fixing our sample size at 6 for each group with an alpha of 0.05.
Estimating the power is
easy. You type the same command, but the only difference is that you
enter a value for the sample size for one group (n
) and
make power = NULL
by removing it from the options. Here is
an example:
pwr.t.test(d = d, n = 6, sig.level =
0.05)
We create a sequence of values by varying the average change in HbA1c from baseline for Treatment A. We will change these from 0% to 2% in intervals of 0.1%.
mu1 <- seq(0.0, 2.0, 0.1)
d <- (mu1 - mu2) / sd_pooled
power1 <- pwr.t.test(d = d, n = 6, sig.level = 0.05)
powerchange <- data.frame(d, power = power1$power * 100)
powerchange
## d power
## 1 -3.621430 99.986261
## 2 -3.259287 99.898978
## 3 -2.897144 99.437136
## 4 -2.535001 97.615372
## 5 -2.172858 92.271971
## 6 -1.810715 80.649971
## 7 -1.448572 61.960800
## 8 -1.086429 39.817661
## 9 -0.724286 20.610368
## 10 -0.362143 8.785855
## 11 0.000000 5.000000
## 12 0.362143 8.785855
## 13 0.724286 20.610368
## 14 1.086429 39.817661
## 15 1.448572 61.960800
## 16 1.810715 80.649971
## 17 2.172858 92.271971
## 18 2.535001 97.615372
## 19 2.897144 99.437136
## 20 3.259287 99.898978
## 21 3.621430 99.986261
plot(powerchange$d,
powerchange$power,
type = "b",
xlab = "Cohen's d",
ylab = "Power (%)")
This figure shows how the power changes with Cohen’s \(d\). It has a symmetrical patter because of negative and positive range associated with Cohen’s \(d\). But the story is the same. As the effect size increases (negative and positive signs do not matter; we only care about the absolute values), the power increases. This makes sense because we only have enough power to detect large differences with the current sample size (which is fixed in this case). If the differences are small, then we do not have enough power with the current sample size of 6.
So far, we discussed how to perform sample size estimations for “between-groups” comparisons. However, many studies investigate the “within-group” changes. This are paired data, which means that the observation for one data point is dependent on another observation. A common study design where paired data is collected are longitudinal studies. Theses types of studies involve repeated measure. For example a pretest posttest study design will measure a data point for a patient at baseline and then repeat that measurement at another point in time. In the figure, Patient A has two measurements at \(t_{0}\) and \(t_{f}\). Since these measurements were made in the same person, the change is “within” the patient. Alternatively, we can think of this as a “repeated” measure since the patient had the measurement performed twice.
Figure caption: Pretest-Posttest (repeated measures) framework
When you have a study where you are performing a paired t test, you
can use the same pwr.t.test()
function for a two sample
test from the pwr
package.
Let’s assume that we want to conduct a prospective study measure the weight change of a cohort of patients who started a diet. You want to enroll enough patients to detect a 5 lb reduction in the weight 3 weeks after the diet started. Let’s assume that at baseline the expected average weight for the cohort was 130 lbs with a standard deviation of 11. After 3 weeks of diet, the expected average weight was 125 lbs with a standard deviation of 12.
We can estimate the effect size (Cohen’s \(d_{z}\)) for a paired t test.
\(\rho\) can range between 0 and 1 where 0 means no correlation and 1 means perfect correlation. For sample size estimations, we can start at 0.5 if we’re uncertain of the actual correlation.
\(\begin{aligned} d_{z} = \frac{ | \mu_{z} |}{\sigma_{z}} = \frac{| \mu_{x} - \mu_{y} | }{ \sqrt{\sigma_{x}^{2} + \sigma_{y}^{2} - 2 \rho_{x, y} \sigma_{x} \sigma_{y}}} \end{aligned}\)
\(x\) denotes “before” (or baseline)
\(y\) denotes “after”
\(d_{z}\) denotes the Cohen’s \(d\) for paired analysis
\(\rho\) denotes the correlation between the measures before and after the diet. (For simplicity, I use 0.50 if I don’t have prior information about this correlation.)
### Parameters for paired analysis or a pretest-posttest study design
mu_x <- 130 ### Average weight before the diet (baseline)
mu_y <- 125 ### Average weight after the diet
sd_x <- 11 ### Standard deviation before the diet
sd_y <- 12 ### Standard deviation after the diet
rho <- 0.5 ### Correlation between measures before and after the diet
sd_z <- sqrt(sd_x^2 + sd_y^2 - 2*rho*sd_x*sd_y)
d_z <- abs(mu_x - mu_y) / sd_z
d_z
## [1] 0.433555
The Cohen’s \(d_{z}\) is 0.433. We
can input this into the pwr.t.test()
function.
n.paired <- pwr.t.test(d = d_z, power = 0.80, sig.level = 0.05, type = "paired")
n.paired
##
## Paired t test power calculation
##
## n = 43.71557
## d = 0.433555
## sig.level = 0.05
## power = 0.8
## alternative = two.sided
##
## NOTE: n is number of *pairs*
We need 44 patients with two measurements (before and after) they implement their diet to detect a difference of 5 lbs or greater with 80% power and a significance level of 0.05.
We can plot how the power will change as the sample size changes for the paired t test analysis. As the sample size increases, power increases. This should make sense. Like our previous examples, as we increase our sample size, we reduce the uncertainty around the estimates. By reducing this uncertainty, we gain greater precision in our estimates, which results in greater confidence in our ability to avoid making a type II error.
### We can plot the power relative to different levels of the sample size for paired analysis.
n_z <- seq(1, 80, 5)
n_z.change <- pwr.t.test(d = d_z, n = n_z, sig.level = 0.05, type = "paired")
## Warning in qt(sig.level/tside, nu, lower = FALSE): NaNs produced
n_z.change.df <- data.frame(n_z, power = n_z.change$power * 100)
n_z.change.df
## n_z power
## 1 1 NaN
## 2 6 14.03624
## 3 11 25.58334
## 4 16 36.84309
## 5 21 47.26307
## 6 26 56.56985
## 7 31 64.66154
## 8 36 71.54769
## 9 41 77.30572
## 10 46 82.04980
## 11 51 85.90929
## 12 56 89.01478
## 13 61 91.48950
## 14 66 93.44465
## 15 71 94.97744
## 16 76 96.17076
plot(n_z.change.df$n,
n_z.change.df$power,
type = "b",
xlab = "Sample size, n",
ylab = "Power (%)")
We increase the sample size from 1 to 80 at 5-unit intervals.
As the sample size increases, we generate more power to detect a difference of 5 lbs with a significance level of 0.05 and a fixed sample size of 44 patients with two measurements (before and after) they implement their diet.
Let’s see how power changes when we change the effect size. Let’s change the average weight after the patients implement their diet. Instead of an average of 125 lbs, let’s see how the power will change when we reduce that to 100 lbs.
### Vary the mu_y from 50 lbs to 130 lbs in intervals of 5 lbs.
mu_y <- seq(50, 130, 5)
d_z <- abs(mu_x - mu_y) / sd_z
n_z.change <- pwr.t.test(d = d_z, n = 44, sig.level = 0.05)
n_z.change.df <- data.frame(d_z, power = n_z.change$power * 100)
n_z.change.df
## d_z power
## 1 6.936880 100.00000
## 2 6.503325 100.00000
## 3 6.069770 100.00000
## 4 5.636215 100.00000
## 5 5.202660 100.00000
## 6 4.769105 100.00000
## 7 4.335550 100.00000
## 8 3.901995 100.00000
## 9 3.468440 100.00000
## 10 3.034885 100.00000
## 11 2.601330 100.00000
## 12 2.167775 100.00000
## 13 1.734220 100.00000
## 14 1.300665 99.99766
## 15 0.867110 98.03639
## 16 0.433555 52.03146
## 17 0.000000 5.00000
plot(n_z.change.df$d_z,
n_z.change.df$power,
type = "b",
xlab = "Cohen's d_z",
ylab = "Power (%)",
xlim = c(0, 2))
We vary the average weight \(\mu_{y}\) between 50 lbs and 130 lbs in intervals of 5 lbs.
When we increase the effect size (Cohen’s \(d_{z}\)), our power goes up; recall that the sample size is fixed at 44 and significance level is 0.05. But when the effect size gets smaller (or when the average weight loss shrinks), we lose power to detect a difference because our sample size is too small. We’ll need to increase our sample size to have a reasonable power to detect small differences.
We can also see how power changes when we vary \(\rho\). If we set \(\rho\) = 0, then the Cohen’s \(d_{z}\) = 0.307. If we set \(\rho\) = 1, then the Cohen’s \(d_{z}\) = 5.
mu_x <- 130 ### Average weight before the diet (baseline)
mu_y <- 125 ### Average weight after the diet
sd_x <- 11 ### Standard deviation before the diet
sd_y <- 12 ### Standard deviation after the diet
sd_z_1 <- sqrt(sd_x^2 + sd_y^2 - 2*1*sd_x*sd_y)
sd_z_0 <- sqrt(sd_x^2 + sd_y^2 - 2*0*sd_x*sd_y)
d_z_1 <- abs(mu_x - mu_y) / sd_z_1
d_z_0 <- abs(mu_x - mu_y) / sd_z_0
d_z_1
## [1] 5
d_z_0
## [1] 0.3071476
So, higher \(\rho\) results in large \(d_{z}\) and smaller \(\rho\) results in small \(d_{z}\) values.
Let’s see how power changes when we change the \(\rho\) range from 0 to 1 in intervals of 0.1 units.
rho <- seq(0.0, 1.0, 0.1)
sd_z <- sqrt(sd_x^2 + sd_y^2 - 2*rho*sd_x*sd_y)
d_z <- abs(mu_x - mu_y) / sd_z
rho.change <- pwr.t.test(d = d_z, n = 44, sig.level = 0.05)
rho.change.df <- data.frame(d_z, power = rho.change$power * 100)
rho.change.df
## d_z power
## 1 0.3071476 29.65480
## 2 0.3236941 32.35129
## 3 0.3432395 35.66281
## 4 0.3668151 39.80733
## 5 0.3960280 45.10546
## 6 0.4335550 52.03146
## 7 0.4842743 61.25990
## 8 0.5583195 73.54735
## 9 0.6816774 88.52181
## 10 0.9552009 99.32401
## 11 5.0000000 100.00000
plot(rho.change.df$d_z,
rho.change.df$power,
type = "b",
xlab = "Cohen's d_z",
ylab = "Power (%)",
xlim = c(0, 1.5))
We vary \(\rho\) from 0 to 1 at intervals of 0.1 unit.
As \(\rho\) increases, our power increases. This makes sense because we are nearing “perfect” correlation, which would require less sample to detect a difference if one existed. As the correlation becomes less “perfect” our power drops suggesting that we need to increase our sample size to make up for this poor correlation.
It is common for the
sample size to be different. The pwr.t2n.test()
is a useful
tool to help estimate the power given the sample sizes of the
study.
It is common to perform power analysis on a study where the sample sizes between groups are different.
Suppose you have a retrospective study where the patients were prescribed Treatment A and Treatment B. There were 130 patients in Treatment A (\(n_{A}\) = 130) and 120 patients in Treatment B (\(n_{B}\) = 120). The average change in HbA1c was 1.5% with a standard deviation of 1.25% in Treatment A, and the average change in HbA1c was 1.4% with a standard deviation of 1.01% in Treatment B.
First, we’ll calculate the pooled standard deviation (\(\sigma_{pooled}\)):
sd1 <- 1.25
sd2 <- 1.01
sd_pooled <- sqrt((sd1^2 +sd2^2) / 2)
sd_pooled
## [1] 1.136354
Once we have the \(\sigma_{pooled}\), we can estimate the Cohen’s \(d\):
mu1 <- 1.5
mu2 <- 1.4
d <- (mu1 - mu2) / sd_pooled
d
## [1] 0.08800076
Now, we can estimate the power with the different sample sizes across the groups (\(n_{A}\) = 130, \(n_{B}\) = 120).
n1 <- 130
n2 <- 120
power.diff_n <- pwr.t2n.test(d = d, n1 = n1, n2 = n2, sig.level = 0.05)
power.diff_n
##
## t test power calculation
##
## n1 = 130
## n2 = 120
## d = 0.08800076
## sig.level = 0.05
## power = 0.1064836
## alternative = two.sided
Since the average HbA1c change from baseline for Treatment A is 1.5% and 1.4% for Treatment B, the average difference in the HbA1c change from baseline is 0.1%. This is a difference (difference between the groups) of the differences (difference from baseline within the group) calculation.
You only have 11% power to detect a difference of 0.10% or greater in the HbA1c change from baseline. This means that you are underpowered to detect a difference of 0.10% or greater in the HbA1c change from baseline with \(n_{A}\) = 130, \(n_{B}\) = 120, and a significance level of 0.05. When studies are underpowered, there is a high potential for type II error. The only way to address this problem is to enroll more patients or expand the sample by relaxing inclusion criteria. But this may increase the threats to the study’s internal validity.
Sample size estimations and power analysis are very useful tools to
determine how many patients you need in your study and how confident you
are that you didn’t make a type II error. Depending on the type of
study, you will need to use different functions from the
pwr
package. I highly encourage you to explore the other
functions of the pwr
package to see if those fit the study
design you have planned.
I generated this tutorial using examples from an excellent website.
The R CRAN
network was a great resource for learning how to
use the pwr
package. I recommend interested students who
want to expand their knowledge of sample size estimations and power
analysis with other statistical inferences (e.g., paired t test, one-way
ANOVA) to explore their vignette on pwr
at this link.
I used the Tufte R Markdown package to create the style for this
tutorial. The authors of the package are JJ Allaire and Yihui Xie, and
their work can be found at this link or on their GitHub page.
You can email me with feedback at:
internal.validity.blog@gmail.com
.
This is a work in progress, and I expect to update this in the future.