Statistical Power for Randomized Experiments: Examples of Calculations and Pitfalls

E. C. Hedberg, Ph.D.
Assistant Professor, Sanford School of Social and Family Dynamics
Senior Fellow, Morrison Institute for Public Policy
Faculty Affiliate, Southwest Interdisciplinary Research Center
Arizona State University

Jan 20, 2016

The what, why, and when, of power analyses

What is power?

The power of a statistical test is the probability that it will yield statistically significant results

-Cohen (1988)

Why do power analyses?

Reasons for your funders:

Interventions (and their evaluations) cost money.
When the sample size is too small, and test is not significant, the answer is inconclusive and resources are wasted.

Reasons for your discipline:

Under powered studies cede the proportion of published results to random chance findings.

Today, many results from experiments are probably false, and low power contributes to this.

The Economist. “Why most published scientific research is probably false.” (2015); based on John Ioannidis (2005)

For example, less than half of replicated psychology results were significant.

Open Science Collaboration. “Estimating the reproducibility of psychological science.” (2015)

When to do power analyses

Before you start your study.

Power analyses guide sample design and study expectations before data are collected.

Once the data are collected, power analyses do not change the results. A power analysis does not negate a significant result.

At best, power analyses are a post mortem for insignificant results.

A more detailed what and how of power analyses

Types of error

Type I error (\(\alpha\)): the chance of falsely rejecting the null hypothesis; convention = 0.05.
Type II error (\(\beta\)): the chance of falsely accepting the null hypothesis; convention = 0.20.
Power is the chance of correctly rejecting the null hypothesis when alternative is true (\(1-\beta\)).

Hypothesis testing

We wish to infer the results of a study to a population.
We assume a particular worldview, the null hypothesis, and evaluate the likelihood that our data originate from that worldview.
Fisher (1920): p-values are the likelihood of observing the result or one more extreme given the null hypothesis is true.
Neyman & Pearson (1933): Hypothesis testing procedure that sets acceptable Type I and Type II error rates and compares the test statistic with a priori values.
Power analysis is based on hypothesis testing and the selected error rates.

Jean Biau, et al. “P Value and the Theory of Hypothesis Testing.”, (2010)

Type I error based on null distribution

Type II error based on alternative distribution with z score of 1

The bigger the test statistic, the larger the power

The expected test statistic is the mean of the alternative distribution, which determines power.

Balancing three things

Power analyses include three tasks:

Finding the power for an effect size with a sample size.
Finding the sample size necessary to get a level of power for an effect size.
Finding the lowest effect size possible with a given sample size and power.

Doing any of these tasks requires mastery of the test statistic in question.

Two group tests

In this presentation, we will focus on the simple two-group comparison, but the same principles apply to other tests (e.g., multilevel two group tests).

Simple independent t-test or dummy-variable regression (same thing).
Adding a covariate (such as a pretest) in a dummy-variable regression or ANCOVA framework (same thing).

Example Analysis: Breakfast intervention to promote weight loss

Dhurandhar, Emily J., et al. “The effectiveness of breakfast recommendations on weight loss: a randomized controlled trial.” The American Journal of Clinical Nutrition 100.2 (2014): 507-513.

I selected only two of three conditions and randomly sampled 25 observations from each. Site variation was ignored.

Dependent variable (\(y\)): BMI at end of study.
Independent variable (\(treat\)): Assignment to eating breakfast (1) vs. general nutritional guidelines (0).
Covariate (\(x\)): BMI at start of study.

Reading in the data

bmi <- data.frame(
  read.csv(file = "bmi.csv")) #create dataframe
bmi[c(5,10,35,40),] #list rows 5, 10, 35, and 40

##    treat        x        y
## 5      0 27.78202 27.02582
## 10     0 37.47405 37.61246
## 35     1 37.38256 38.64678
## 40     1 38.32031 40.85937

Summary statistics: mean and standard deviation by group

attach(bmi)
aggregate(cbind(x, y) ~ treat, FUN = mean)

##   treat        x        y
## 1     0 32.03280 31.91977
## 2     1 32.71625 33.13091

aggregate(cbind(x, y) ~ treat, FUN = sd)

##   treat        x        y
## 1     0 5.495015 5.762522
## 2     1 4.541026 5.289409

Data generating process

The outcome for person \(i\) in group \(j\) is a function of the overall mean, \(\mu\), plus the effect of the group, \(\tau_j\), plus the within-group residual, \(\epsilon\)

\[ y_{ij} = \mu + \tau_j + \epsilon_{ij} \] \[ \tau_j = \mu_j - \mu \] \[ \epsilon_{ij} = y_{ij} - \mu_j \] \[ \epsilon_{ij} \sim N\left[0, \sigma^2\right] \]

where \(\sigma^2\) is the population variance of the outcome.

Post-intervention BMI = Average Post-intervention BMI + Effect of Intervention or Control + Random Error

The test where \(\sigma^2\) is known

If \(\bar{y}_1\) is the mean for the intervention group and \(\bar{y}_0\) is the mean of the control group, and each group has \(n\) observations (i.e., balanced data), then

\[\bar{y}_1 - \bar{y}_0 \sim N\left[\mu_1-\mu_0,\sigma^2\frac{2}{n}\right]\]

Thus, to get the p-value, you calculate the z-score of the observed difference in the means as it relates to the null hypothesis and the variance of the mean difference.

\[ z = \frac{(\bar{y}_1 - \bar{y}_0)-\mbox{Null hypothesis}}{\sqrt{\sigma^2\frac{2}{n}}} \]

If \(z\) is greater in magnitude than the critical value, you reject the null.

The test statistic for samples (where \(\sigma^2\) is unknown)

We usually want to generalize to a known population or theoretical super population.
That means we are working with samples and that \(\sigma^2\) is unknown and must be estimated.
The estimate of the population variance is the pooled variance, which for balanced data is \[ \hat{\sigma}^2 = \frac{\hat{\sigma^2_1}+\hat{\sigma^2_0}}{2}\] and for unbalanced data it is the more familiar \[ \hat{\sigma}^2 = \frac{(n_1-1)\hat{\sigma^2_1}+(n_0-1)\hat{\sigma^2_0}}{n_1+n_0-2}\]
The sample variance is not a good estimate of the population variance when treatments have been introduced, which is why we use the pooled variance.

The test statistic for samples (where \(\sigma^2\) is unknown)

Thus, the sampling variance of the estimated difference in means is (when data are balanced) \[ \hat{V}\left\{\bar{y}_1-\bar{y}_0\right\} = \hat{\sigma}^2\frac{2}{n}\] making the test statistic \(t\) \[ t = \frac{(\bar{y}_1-\bar{y}_0) - \mbox{Null hypothesis}}{\sqrt{\hat{\sigma}^2\frac{2}{n}}}\] distributed with \(2n-2\) degrees of freedom. Note that the computation of the pooled variance, \(\hat{\sigma}^2\) assumes that each treatment group has the same variance in reality (thus the observed differences are by chance).

Testing homogenity of variance

aggregate(y ~ treat, FUN = sd)

##   treat        y
## 1     0 5.762522
## 2     1 5.289409

bartlett.test(y ~ treat)

## 
##  Bartlett test of homogeneity of variances
## 
## data:  y by treat
## Bartlett's K-squared = 0.17233, df = 1, p-value = 0.678

Performing the \(t\)-test with regression

printCoefmat(coefficients(summary(lm(y ~ treat))))

##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  31.9198     1.1062 28.8552   <2e-16
## treat         1.2111     1.5644  0.7742   0.4426

Power of that test

The alternative distribution has a mean of the \(t\)-test, thus most of the distribution is before the critical value. The power is very low.

All about the test

So, the expected test is what determines power. The alternative distribution is governed by its mean, the noncentrality parameter, \(\lambda\) \[\mbox{Expected test} = \lambda \]

The goal is to break down the expected test into scale-free parameters that don’t require knowledge about the unit of measurement. We can figure out the anatomy of the test by looking at the “Statistics 101” regression formula.

The regression model is \[ y_{ij} = \gamma_0 + \gamma_1\times\mbox{treat}_{ij} + \epsilon_{ij} \] and \(\gamma_1\) is defined as \[ \gamma_1 = \frac{\sum_j\sum_i\left(y_{ij}-\bar{y}\right)\times\left(\mbox{treat}_{ij}-\bar{\mbox{treat}}\right)}{\sum_j\sum_i\left(\mbox{treat}_{ij}-\bar{\mbox{treat}}\right)^2}\]

The slope

In the balanced case, the denominator of \(\gamma_1\) is the total sample size divided by four, or half the sample, \(n\), divided by 2, \[ \sum_j\sum_i\left(\mbox{treat}_{ij}-\bar{\mbox{treat}}\right)^2 = \sum_j\sum_i 0.25 =\frac{N}{4}= \frac{n}{2} \] The numerator of \(\gamma_1\) simplifies to \[\sum_j\sum_i\left(y_{ij}-\bar{y}\right)\times\left(\mbox{treat}_{ij}-\bar{\mbox{treat}}\right) = \frac{1}{2}\sum_i\left(y_{i1}-\bar{y}\right)-\frac{1}{2}\sum_i\left(y_{i0}-\bar{y}\right)\] Thus, we can express the slope as \[ \hat{\gamma}_1 = \frac{\frac{1}{2}\sum_i\left(y_{i1}-\bar{y}\right)-\frac{1}{2}\sum_i\left(y_{i0}-\bar{y}\right)}{n/2}=\frac{\sum_i\left(y_{i1}-\bar{y}\right)-\sum_i\left(y_{i0}-\bar{y}\right)}{n} \] which is equivalent to \[ \hat{\gamma}_1 = \frac{\sum_iy_{i1}}{n} - \frac{\sum_iy_{i0}}{n} = \bar{y}_1-\bar{y}_0 \]

The variance of the slope

The estimated sampling variance of the slope, \(\gamma_1\), in a bi-variate regression is often presented as \[ \hat{\mbox{V}}\left\{\hat{\gamma}_1\right\}=\frac{\hat{\sigma}_\epsilon^2}{\sum_j\sum_i\left(\mbox{treat}_{ij}-\bar{\mbox{treat}}\right)^2} \] where the mean squared error of a bi-variate regression (MSE or \(\sigma_\epsilon^2\)) is estimated as \[ \hat{\sigma}_\epsilon^2 = \frac{\sum_j\sum_i\hat{\epsilon}_{ij}^2}{N-2} = \frac{\left(n_1-1\right)\sigma_1^2+\left(n_0-1\right)\sigma_0^2}{n_1+n_0-2} \] which means that for bi-variate models with a dichotomous predictor, the estimate of the MSE is the same as the estimate of the population variance, \(\sigma^2\), meaning that that the estimated sampling variance of \(\gamma_1\) is \[ \hat{\mbox{V}}\left\{\hat{\gamma}_1\right\} = \hat{\sigma}^2\frac{2}{n}. \]

Let’s rearrange the test into scale free parameters

Thus, the slope is the mean difference is \[ \hat{\gamma}_1 = \bar{y}_1-\bar{y}_0 \] and the standard error is \[ \sqrt{\hat{\sigma}^2\frac{2}{n}} \] so the test is \[ \frac{\left(\bar{y}_1-\bar{y}_0\right)-\mbox{Null hypothesis}}{\sqrt{\hat{\sigma}^2\frac{2}{n}}}\] making the expected test \[ \lambda = \frac{\mu_1-\mu_0}{\sqrt{\sigma^2\frac{2}{n}}} = \frac{\mu_1-\mu_0}{\sigma}\sqrt{\frac{n}{2}} = \delta\sqrt{\frac{n}{2}} \] \[\mbox{Effect size} = \delta = \frac{\mu_1-\mu_0}{\sigma} \]

Effect sizes

In order to make \(\lambda\) scale free, we replaced \(\frac{\mu_1-\mu_0}{\sigma}\) with the effect size \(\delta\).

Effect sizes represent the correlation between treatment conditions and an outcome. Typically, effect sizes are either in the form of the correlation coefficient \(r\) or Cohen’s \(d\) (a standardized difference between means). In the balanced case, the translations between the two are

\[ \delta = \frac{\mu_1-\mu_0}{\sigma} \] \[ \delta = \frac{2\rho}{\sqrt{1-\rho^2}} \] \[ \rho = \sqrt{\frac{\delta^2}{\delta^2+4}} \]

These formulas are useful for reading previous literature to inform expectations.

Cohen, “A Power Primer.” (1992)

Ferguson, “An Effect Size Primer: A Guide for Clinicians and Researchers.” (2009)

Simulation example

sim <- function(delta, n) {
  y.0 <- rnorm(n, mean = 0) #cases in control
  y.1 <- rnorm(n, mean = delta) #cases in treatment
  test <- t.test(y.1,y.0, var.equal = TRUE) #the test
  return(t = test$statistic) #return the result
}
sim(delta = .5, n = 50) #one simulation

##        t 
## 1.427165

Simulation example

delta <- .5 #effect size
n <- 50 #size per group
delta*sqrt(n/2) #expected test

## [1] 2.5

results <- 
  replicate(5000,sim(delta,n)) #5000 simulations
mean(results) #mean of the simulation

## [1] 2.510451

What proportion of the results are after the critical value?

alpha <- .05
df <- 2*n - 2
ct <- qt(alpha/2, df, lower.tail = FALSE)
sim.power <- length(results[results >= ct])/length(results)
sim.power

## [1] 0.6944

What proportion of the results are after the critical value?

length(results[results >= ct])/length(results)

## [1] 0.6944

The noncentrality parameter

\[ \lambda = \delta\sqrt{\frac{n}{2}} \]

With the non-centrality parameter, we can use an expected effect size, \(\delta\), and an expected sample size, \(n\), to estimate what our test-statistic is likely to be, \(\lambda\).

We then use \(\lambda\) calculate the expected Type II error given a level of Type I error, and thus, power.

The shaded area is the Type II error. It is the area of the alternative distribution between the two critical values. How do we calculate the area of the non-central distribution between the two critical values?

Area between critical values

If we assume some Type I error (\(\alpha\), usually 0.05 with two tails) and a sample size (which produces degrees of freedom), Type II error is

\[\beta = \underbrace{H\left[t_{\frac{\alpha}{2}}^{critical},df,\lambda\right]}_\text{Area before right critial value}-\underbrace{H\left[-t_{\frac{\alpha}{2}}^{critical},df,\lambda\right]}_\text{Area before left critical value} \]

where \(H\left[a,b,c\right]\) is the cumulative function of the non-central \(t\) distribution at point \(a\) with \(b\) degrees of freedom and non-centrality parameter \(c\).

The power, then, is the complement of this area

\[\mbox{Power} = 1-\beta.\]

Power given effect size and sample size

In R, we can use the pt function to get the cumulative non-central \(t\) distribution. Suppose we thought that we would have 50 cases in each group with an effect size of 0.5 (just like the simulation), power would then be

n <- 50 #cases per group
df <- 2*n-2 #degrees of freedom
delta <- .5 #effect size
alpha <- .05 #type i error
ct <- qt(alpha/2,df, lower.tail = F) #critial t
lambda <- delta*sqrt(n/2) #noncentrality parameter
beta <- pt(ct,df,lambda)-pt(-ct,df,lambda) #type ii error
power <- 1-beta #power
power

## [1] 0.6968934

sim.power

## [1] 0.6944

Intuition about power

Power increases both with sample size and effect size.

What is the effect size in the BMI data?

mean1 <- mean(y[treat==1]) #mean treatment
mean0 <- mean(y[treat==0]) #mean control
var1 <- var(y[treat==1]) #variance treatment
var0 <- var(y[treat==0]) #variance control
varp <- (var1+var0)/2 #pooled variance
bmi.delta <- (mean1-mean0)/sqrt(varp) #effect size
bmi.delta

## [1] 0.2189726

What was the power for the BMI test?

n <- 25 #cases per group
df <- 2*n-2 #degrees of freedom
delta <- bmi.delta #effect size
alpha <- .05 #type i error
ct <- qt(alpha/2,df, lower.tail = FALSE) #critical
lambda <- delta*sqrt(n/2) #expected test statistic
lambda

## [1] 0.774185

beta <- pt(ct,df,lambda)-pt(-ct,df,lambda) #type ii error
power <- 1-beta #power
power

## [1] 0.1181177

Sample size given power and effect size

This is difficult because there is no way to solve \[ \mbox{Power} = 1- H\left[t_{\frac{\alpha}{2}}^{critical},df,\lambda\right]+H\left[-t_{\frac{\alpha}{2}}^{critical},df,\lambda\right] \] for the sample size because \(t_{\frac{\alpha}{2}}^{critical}\) keeps changing with the sample size.

However, there is a trick. When \(\sigma^2\) is known, \[ Z_{1-\beta}-Z_{\alpha/2} = \lambda \] Where \(Z_{\alpha/2}\) is the quantile value of the normal distribution at half of \(\alpha\) Thus, if \(\alpha = 0.05\), \(Z_{0.025} = -1.96\). Leading to a quantity \(M\) that is approximate to the expected test \[ M_z = Z_{\mbox{Power}}-Z_{\alpha/2} \]

Sample size given power and effect size

This means \[ M_z \approx \lambda = \delta\sqrt{\frac{n}{2}} \] which we can rearrange to find \[ n_z = \frac{2M_z^2}{\delta^2} \] now, the approximate sample size is a function of \(1-\beta\) (power), \(\alpha\) (Type I error), and the effect size (\(\delta\)).

Next, we use \(n_z\) for the degrees of freedom to calculate a sample-based factor, \(M_t\). Finally, we use the same formula \((n_t = \frac{2M_t^2}{\delta^2})\) to find the appropriate sample size.

Sample size given power and effect size

On the computer, we can do this easily

alpha <- .05 #set alpha
power <- .8 #set power
delta <- .5 #set effect size
M_z <- qnorm(power)-qnorm(alpha/2) #normal factor
n_z <- (2*M_z^2)/delta^2 #approx sample size

next, we use the approximate sample size to get a better estimate with \(M_t\)

M_t <- qt(power,2*n_z-2)-
  qt(alpha/2,2*n_z-2) #sample factor
n_t <- (2*M_t^2)/delta^2 #sample size
n_t

## [1] 63.79463

How many cases would the BMI study need?

alpha <- .05 #set alpha
power <- .8 #set power
delta <- bmi.delta #set effect size
M_z <- qnorm(power)-qnorm(alpha/2) #normal factor
n_z <- (2*M_z^2)/delta^2 #approx sample size
M_t <- qt(power,2*n_z-2)-
  qt(alpha/2,2*n_z-2) #sample factor
n_t <- (2*M_t^2)/delta^2 #sample size
n_t

## [1] 328.3649

328 in each group

Lowest effect size possible with a given sample size and power

We call this the minimum detectable effect size (MDES).

This is approximated by \[ \delta_{m.d.} = M_t\sqrt{\frac{2}{n}} \]

n <- 25
df <- 2*n-2
alpha <- .05
power <- 0.8
M_t <- qt(power,df)-qt(alpha/2,df)
mdes <- M_t*sqrt(2/n)
mdes

## [1] 0.808876

Intuition about MDES

Larger samples are more sensitive.

Adding covariates

Adding covariates (like a pretest) to an analysis of an experiment can (possibly) increase power by reducing the variance of the treatment effect.

However, if there is any correlation between the treatment condition and the covariate, it could make matters worse.

Example BMI analysis with pretest covariate

Compare effect of treat with and without the covariate

##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  31.9198     1.1062 28.8552   <2e-16
## treat         1.2111     1.5644  0.7742   0.4426

##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.375131   1.149875 -2.0656  0.04441
## x            1.070618   0.035073 30.5256  < 2e-16
## treat        0.479431   0.347263  1.3806  0.17393

aggregate(x ~ treat, FUN = mean)

##   treat        x
## 1     0 32.03280
## 2     1 32.71625

Adding covariates

New data generating process \[ y_{ij}=\mu+\tau_j^{'}+\phi\left(x_{ij}-\bar{x}\right)+\epsilon_{ij} \] where \(\tau_j^{'}\) is the adjusted treatment effect and \(\phi\) is the within-group slope for \(x\), the covariate, predicting the outcome, \(y\). \[ \tau_j^{'} = \tau_j - \phi\left(\bar{x}_1-\bar{x}_0\right) \] This means that the adjusted treatment effect, \(\tau_j^{'}\), is related to the unadjusted treatment effect through \(\phi\) and the difference between the means of the covariate by treatment and control groups.

When there is no correlation between treatment and the covariate \(x\), the means of \(x\) for each group are the same. In such cases, \(\tau_j^{'} = \tau_j\).

Regression slope for treatment group

The regression model is

\[ y_{ij} = \gamma_0 + \gamma_1\times \mbox{treat}_{ij} + \gamma_2\times (x_{ij}-\bar{x}) + \epsilon_{ij} \]

The colinearity between treatment group and the covariance makes the test more complex. The regression slope for the treatment effect is \[ \gamma_1 = \left(\bar{y}_1-\bar{y}_0\right) - \gamma_2\left(\bar{x}_1-\bar{x}_0\right) \] and the variance of the effect is \[ \hat{V}\left\{\hat{\gamma_1} \right\} = \hat{\sigma}^2\left(1-\hat{\rho}_{yx,w}^2\right)\frac{2n-2}{2n-3}\frac{2}{n}\frac{1}{1-\hat{\rho}_{tx}^2} \]

Porter & Raudenbush. “Analysis of covariance: Its model and use in psychological research.” (1987)

Noncentrality parameter

The final test statistic has many components \[ \lambda = \underbrace{\frac{\left(\mu_1-\mu_0\right) - \gamma_2\left(\bar{x}_1-\bar{x}_0\right)}{\sigma}}_\text{Adjusted Effect size}\underbrace{\sqrt{\frac{1}{\left(1-\rho_{yx,w}^2\right)}}}_\text{Covariate effect}\underbrace{\sqrt{\frac{n}{2}\frac{2n-3}{2n-2}}}_\text{Sample size}\underbrace{\sqrt{1-\rho_{tx}^2}}_\text{Multicolinearity} \] where the effectiveness of the covariate in explaining variation in the outcome is \(\rho_{yx,w}^2\) and the multicolinearity that can be expected between the group indicator and the covariate is \(\rho_{tx}\)

If we can plan on zero correlation between the covariate and treatment group, this expression simplifies to \[ \lambda = \underbrace{\frac{\left(\mu_1-\mu_0\right)}{\sigma}}_\text{Effect size}\underbrace{\sqrt{\frac{1}{\left(1-\rho_{yx,w}^2\right)}}}_\text{Covariate effect}\underbrace{\sqrt{\frac{n}{2}\frac{2n-3}{2n-2}}}_\text{Sample size} \]

Further simplifications

The square root of \(\frac{2n-3}{2n-2}\) is always less than 1, but never dips below about 0.95 for samples of \(n = 6\) or greater.

curve(sqrt((2*x-3)/(2*x-2)), from = 4, to = 10)

Further simplifications

Thus, unless the sample is extremely small, the effect of \(\frac{2n-3}{2n-2}\) on the non-centrality parameter is minimal. Knowing this, the non-centrality parameter be approximated with the simplified formula \[\lambda \approx \frac{\left(\mu_1-\mu_0\right) - \gamma_2\left(\bar{x}_1-\bar{x}_0\right)}{\sigma}\sqrt{\frac{n}{2}}\sqrt{\frac{1}{\left(1-\rho_{yx,w}^2\right)}}\sqrt{1-\rho_{tx}^2}\]

If the covariate is uncorrelated, it can only increase power.
In the case of a correlation between treatment and the covariate, the benefit can be outweighed by the smaller effect of treatment and the inflation of the variance

Finding power with a covariate

n <- 25 #sample size 
df <- 2*n-2-1 #degrees of freedom
delta_a <- 0.5 #adjusted effect size
r_yx.w <- 0.5 #within group correlation
r_tx <- 0.2 #multicolinearity
alpha <- .05 #type i error
ct <- qt(alpha/2,df, 
         lower.tail = FALSE) #critical t
lambda <- delta_a*sqrt(n/2)*
  sqrt(1/(1-r_yx.w^2))*
  sqrt(1-r_tx^2) #noncentrality parameter
beta <- pt(ct,df,ncp)-pt(-ct,df,ncp) #type ii error
power <- 1-beta #power
power

## [1] 0.1947309

Finding the sample size

\[ n \approx \frac{2M_t^2}{\delta_a^2}\frac{1-\rho_{yx,w}^2}{1-\rho_{tx}^2} \]

alpha <- 0.05 #type i error
power <- 0.8 #power
delta_a <- 0.5 #adjusted effect size
r_yx.w <- 0.5 #within group correlation
r_tx <- 0.2 #multicolinearity
M_z <- qnorm(power)-qnorm(alpha/2) #normal factor
n_z <- ((2*M_z^2)/delta_a^2)*
  ((1-r_yx.w^2)/(1-r_tx^2)) #approx. sample size
M_t <- qt(power,2*n_z-2-1)-
  qt(alpha/2,2*n_z-2-1) #sample factor
n_t <- ((2*M_t^2)/delta_a^2)*
  ((1-r_yx.w^2)/(1-r_tx^2)) #sample size
n_t

## [1] 50.07819

Statistical Power for Randomized Experiments: Examples of Calculations and Pitfalls

The what, why, and when, of power analyses

What is power?

Why do power analyses?

When to do power analyses

A more detailed what and how of power analyses

Types of error

Hypothesis testing

Type I error based on null distribution

Type II error based on alternative distribution with z score of 1

The bigger the test statistic, the larger the power

Balancing three things

Two group tests

Example Analysis: Breakfast intervention to promote weight loss

Reading in the data

Summary statistics: mean and standard deviation by group

Data generating process

The test where \(\sigma^2\) is known

The test statistic for samples (where \(\sigma^2\) is unknown)

The test statistic for samples (where \(\sigma^2\) is unknown)

Testing homogenity of variance

Performing the \(t\)-test with regression

Power of that test

All about the test

The slope

The variance of the slope

Let’s rearrange the test into scale free parameters

Effect sizes

Simulation example

Simulation example

What proportion of the results are after the critical value?

What proportion of the results are after the critical value?

What proportion of the results are after the critical value?

What proportion of the results are after the critical value?

The noncentrality parameter

Area between critical values

Power given effect size and sample size

Intuition about power

What is the effect size in the BMI data?

What was the power for the BMI test?

Sample size given power and effect size

Sample size given power and effect size

Sample size given power and effect size

How many cases would the BMI study need?

Lowest effect size possible with a given sample size and power

Intuition about MDES

Adding covariates

Example BMI analysis with pretest covariate

Adding covariates

Regression slope for treatment group

Noncentrality parameter

Further simplifications

Further simplifications

Finding power with a covariate

Finding the sample size

Other topics: Multilevel models

Other topics: Generalized outcomes

Thank you!