Introduction

In this simulation study we will investigate the power of the significance of regression test for simple linear regression.

\[ H_0: \beta_{1} = 0 \ \text{vs} \ H_1: \beta_{1} \neq 0 \]

Recall, we had defined the significance level, \(\alpha\), to be the probability of a Type I error.

\[ \alpha = P[\text{Reject } H_0 \mid H_0 \text{ True}] = P[\text{Type I Error}] \]

Similarly, the probability of a Type II error is often denoted using \(\beta\); however, this should not be confused with a regression parameter.

\[ \beta = P[\text{Fail to Reject } H_0 \mid H_1 \text{ True}] = P[\text{Type II Error}] \]

Power is the probability of rejecting the null hypothesis when the null is not true, that is, the alternative is true and \(\beta_{1}\) is non-zero.

\[ \text{Power} = 1 - \beta = P[\text{Reject } H_0 \mid H_1 \text{ True}] \]

Essentially, power is the probability that a signal of a particular strength will be detected. Many things affect the power of a test. In this case, some of those are:

We’ll investigate the first three here in this study.

Methods

1. Overall design

To investigate the effects of sample size \(n\), signal strength \(\beta_1\), and noise level \(\sigma\) on power, we will simulate from the model

\[ Y_i = \beta_0 + \beta_1 x_i + \epsilon_i \]

where \(\epsilon_i \sim N(0, \sigma^2)\).

For simplicity, we will let \(\beta_0 = 0\), thus \(\beta_1\) is essentially controlling the amount of “signal.” We will then consider different signals, noises, and sample sizes:

  • \(\beta_1 \in (-2, -1.9, -1.8, \ldots, -0.1, 0, 0.1, 0.2, 0.3, \ldots 1.9, 2)\)
  • \(\sigma \in (1, 2, 4)\)
  • \(n \in (10, 20, 30)\)

We will hold the significance level constant at \(\alpha = 0.05\).

The following code will be used to generate the predictor values, x: values for different sample sizes.

x_values = seq(0, 5, length = n)

For each possible \(\beta_1\) and \(\sigma\) combination, we will simulate from the true model at least \(1000\) times. Each time, perform the significance of the regression test. It is possible to derive an expression for power mathematically, but often this is difficult, so instead, we rely on simulation. To estimate the power with these simulations, and some \(\alpha\), we will use

\[ \hat{\text{Power}} = \hat{P}[\text{Reject } H_0 \mid H_1 \text{ True}] = \frac{\text{# Tests Rejected}}{\text{# Simulations}} \]

2. (Hyper)parameter setup

First, we will set up a seed and all the hyperparameters for this simulation using the following code.

# load libraries
library(scales)
## Warning: package 'scales' was built under R version 3.6.2
# set up a seed
set.seed(19871223)

# set up parameters
alpha = 0.05
betas = seq(-2, 2, 0.1)
sigmas = c(1, 2, 4)
sample_sizes = c(10, 20, 30)
B = 1000

3. Result container setup

We will next perform the simulation. To store the simulation results, I first generate some empty lists. Specifically, I have a list for the estimated \(\hat{\beta_1}\), a list for standard error of \(\hat{\beta_1}\), a list for t values, and a list for p values. Within each list, there are 9 elements, with each element being a 41x1000 matrix for a specific combination of \(\sigma\) and sample size. Each row of that matrix will hold the values for the 1000 simulations for one specific \(\beta_1\) value, and we have 41 rows in total. I also generate an empty list for the power values. It also has 9 elements, but each element is a vector that holds the 41 values of power. For each round of 1000 simulations for a specific set of parameters (\(\beta_1\), \(\sigma\) and sample size), I calculate the power value based on the p values. Specifically, that is the proportion of p values being less than the pre-defined \(\alpha\) threshold. Technically, this is only valid for the 40 non-zero \(\beta_1\) values, but it does not affect our result and analysis.

# first set up some variables to store the result
study3_t_values_list = list() 
study3_beta1_se_list = list() 
study3_beta1_hat_list = list()
study3_p_values_list = list()
study3_powers = list()
list_idx = 0

4. Simulation function

We define a function to perform one single simulation. This function takes x_values, sample size, \(\sigma\), and \(\beta_1\) as arguments. It first generates the noise component, then generate the response data, and build a linear regression model. It returns the coefficients for the \(\hat{\beta_1}\) parameter. The reason we pass in the x_values here is because x_values only depend on sample size and will not change for the parameter sets containing the same sample size.

study3_sim_t = function(x_values, sample_size, sigma, beta_1) {
    epsilon = rnorm(sample_size, mean = 0, sd = sigma)
    y = beta_1 * x_values + epsilon
    model = lm(y ~ x_values)
    summary(model)$coefficients["x_values", ]
}

5. Simulation and result extraction

Now we wil finally perform the simulation. This is done using 3 loops. The outer loop takes care of the sample size parameter, and generate the x_values according to the specific sample size. The middle loop takes care of the \(\sigma\) parameter, and also further set up the data container. Because we have 9 elements in all of our lists, here we increment the index for the list, and set up the empty matrix or vector for each list element. We also generate readeable names of that list element according to the specific sample size and \(\sigma\) values. The final loop loops through our \(\beta_1\) values, and for each \(\beta_1\) value, perform the 1000 time simulation and return the results. At the end, we fill these results into our list elements. Note this is also where we calculate the power values for a given set of 1000 p values.

for (i in 1:length(sample_sizes)){
  sample_size = sample_sizes[i]
  
  # set up x values
  x_values = seq(0, 5, length = sample_size)
  
  for (j in 1:length(sigmas)){
    sigma = sigmas[j]
    
    # set up list element
    list_idx = list_idx + 1
    study3_t_values_list[[list_idx]] = matrix(nrow = length(betas), ncol = B)
    study3_beta1_se_list[[list_idx]] = matrix(nrow = length(betas), ncol = B)
    study3_beta1_hat_list[[list_idx]] = matrix(nrow = length(betas), ncol = B)
    study3_p_values_list[[list_idx]] = matrix(nrow = length(betas), ncol = B)
    study3_powers[[list_idx]] = rep(0, length(betas))
    list_element_name = paste("size", sample_size, "sigma", sigma, sep = '_')
    names(study3_t_values_list)[list_idx] = list_element_name
    names(study3_beta1_se_list)[list_idx] = list_element_name
    names(study3_beta1_hat_list)[list_idx] = list_element_name
    names(study3_p_values_list)[list_idx] = list_element_name
    names(study3_powers)[list_idx] = list_element_name
    
    for (k in 1: length(betas)){
      beta_1 = betas[k]
      
      # perform simulation and obtain results
      sim_results = replicate(B, {study3_sim_t(x_values, sample_size, sigma, beta_1)})
      
      # store simulation results
      study3_t_values_list[[list_idx]][k,] = sim_results[3,]
      study3_beta1_se_list[[list_idx]][k,] = sim_results[2,]
      study3_beta1_hat_list[[list_idx]][k,] = sim_results[1,]
      study3_p_values_list[[list_idx]][k,] = sim_results[4,]
      study3_powers[[list_idx]][k] = mean(sim_results[4,] < alpha)
      #study3_powers[[list_idx]][k] = mean(2 * pt(abs(sim_results[3]), sample_size - 2, lower.tail = FALSE) < alpha)
    }
  }
}

Results

1. Power values of different (hyper)parameters

First of all, let’s take a look at the power values. As shown in Figure 1, we hava a panel of 3 plots shown power curves at different \(\sigma\) values (\(\sigma\) = 1, 2, 4 from left to right). In each panel, the x-axis is the \(\beta_1\) values, and the y-axis is corresponding power values. There are 3 curves in each panel, colored in red, blue and green for sample size of 10, 20, and 30, respectively.

# how is power influenced by the parameters?
par(mfcol=c(1,3), oma = c(0, 0, 3, 9), mai=c(0.6,0.6,0.2,0), xpd = TRUE)

# sigma = 1

plot(betas, study3_powers$size_10_sigma_1, type = "l", main = expression(sigma ~ '= 1'),
     col = "red", xlab = expression(beta[1]), ylab = "Power", lty = 2, lwd = 1.5)
lines(betas, study3_powers$size_20_sigma_1, col = "blue", type = "l", lty = 2, lwd = 1.5)
lines(betas, study3_powers$size_30_sigma_1, col = "darkgreen", type = "l", lty = 2, lwd = 1.5)

# sigma = 2
plot(betas, study3_powers$size_10_sigma_2, type = "l", main = expression(sigma ~ '= 2'),
     col = "red", xlab = expression(beta[1]), lty = 2, lwd = 1.5, ylab = "")
lines(betas, study3_powers$size_20_sigma_2, col = "blue", type = "l", lty = 2, lwd = 1.5)
lines(betas, study3_powers$size_30_sigma_2, col = "darkgreen", type = "l", lty = 2, lwd = 1.5)

# sigma = 4
plot(betas, study3_powers$size_10_sigma_4, type = "l", main = expression(sigma ~ '= 4'),
     col = "red", xlab = expression(beta[1]), lty = 2, lwd = 1.5, ylab = "", ylim = c(0, 1))
lines(betas, study3_powers$size_20_sigma_4, col = "blue", type = "l", lty = 2, lwd = 1.5)
lines(betas, study3_powers$size_30_sigma_4, col = "darkgreen", type = "l", lty = 2, lwd = 1.5)

legend("right", inset = c(-0.4,0), legend=c('N = 10','N = 20', 'N = 30'), col=c("red", "blue", "darkgreen"), lty = 2, bty = 'n', xpd = NA)
mtext('Figure 1 - Statistical Power Influencd by Different Modeling Parameters', line = 1, outer = TRUE)

2. Simulation robustness

Although the power curves are pretty smooth in general, there are some jagged part. It would be helpful to check if the simulation reached a robust enough state. So let’s take a look at the empirical mean of the values we get from the simulations as the simulation number grows.

par(mfrow=c(2,2), oma = c(0, 0, 3, 0), mai=c(0.8,0.8,0.4,0))

size = 10
sigma = 1
beta_1 = 0.8
idx_beta_1 = which(abs(betas - beta_1) < 0.01)
S_xx = sum((x_values - mean(x_values))^2)
sd_beta_1 = sigma/sqrt(S_xx)


beta_1_hats_10_1_0.8 = study3_beta1_hat_list$size_10_sigma_1[idx_beta_1,]
plot(cumsum(beta_1_hats_10_1_0.8) / (1:length(beta_1_hats_10_1_0.8)), type = "l", ylim = c(beta_1-0.1, beta_1+0.1),
     main = expression(sigma ~ '= 1, N = 10'),
     xlab = "",
     ylab = expression("Empirical Mean of " ~ hat(beta)[1]),
     col  = "dodgerblue")
abline(h = beta_1, col = "darkorange", lwd = 2)

beta_1_hats_10_4_0.8 = study3_beta1_hat_list$size_10_sigma_4[idx_beta_1,]
plot(cumsum(beta_1_hats_10_4_0.8) / (1:length(beta_1_hats_10_4_0.8)), type = "l", ylim = c(beta_1-0.1, beta_1+0.1),
     main = expression(sigma ~ '= 4, N = 10'),
     xlab = "",
     ylab = "",
     col  = "dodgerblue")
abline(h = beta_1, col = "darkorange", lwd = 2)

beta_1_hats_30_1_0.8 = study3_beta1_hat_list$size_30_sigma_1[idx_beta_1,]
plot(cumsum(beta_1_hats_30_1_0.8) / (1:length(beta_1_hats_30_1_0.8)), type = "l", ylim = c(beta_1-0.1, beta_1+0.1),
     main = expression(sigma ~ '= 1, N = 30'),
     xlab = "Number of Simulations",
     ylab = expression("Empirical Mean of " ~ hat(beta)[1]),
     col  = "dodgerblue")
abline(h = beta_1, col = "darkorange", lwd = 2)

beta_1_hats_30_4_0.8 = study3_beta1_hat_list$size_30_sigma_4[idx_beta_1,]
plot(cumsum(beta_1_hats_30_4_0.8) / (1:length(beta_1_hats_30_4_0.8)), type = "l", ylim = c(beta_1-0.1, beta_1+0.1),
     main = expression(sigma ~ '= 4, N = 30'),
     xlab = "Number of Simulations",
     ylab = "",
     col  = "dodgerblue")
abline(h = beta_1, col = "darkorange", lwd = 2)

mtext(expression("Figure 2 - Empirical Mean of " ~ hat(beta)[1] ~ "as Simulation Number Grows"), line = 1, outer = TRUE)

3. Understanding power

To further explore the power values and how (and why) they are influenced by the parameters, we will also visualize other data that we obtained from the simulation. We will look at their histograms, as well as therotical distributions. Thus, we first establish some helper functions here to plot distributions.

# a function to plot t distribution
plot_t_dist = function(df, ncp = 0, lower = x_lim_left, upper = x_lim_left, x_lim_left = -10, x_lim_right = 10, add = FALSE, col = 'red', transparency = 0.6){
  if(lower == -Inf | lower < x_lim_left){
    lower = x_lim_left
  }
  if(upper == Inf | upper > x_lim_right){
    upper = x_lim_right
  }
  
  curve(dt(x, df, ncp = ncp), from = x_lim_left, to = x_lim_right, type = "l", xlab = "t", ylab = "f(t)", add = add, col = col, lwd = 2)
  prob = seq(from = lower, to = upper, by = 0.01)
  polygon(c(lower, prob, upper),
          c(0, dt(prob, df, ncp = ncp), 0),
          col = alpha(col, transparency), border = NA)
  abline(h = 0)
}

# a function to plot normal distribution
plot_norm_dist = function(mu, sigma, lower = x_lim_left, upper = x_lim_left, add = FALSE, col = 'red', transparency = 0.6){
  x_lim_left = mu - 4*sigma
  x_lim_right = mu + 4*sigma
  
  if(lower == -Inf | lower < x_lim_left){
    lower = x_lim_left
  }
  if(upper == Inf | upper > x_lim_right){
    upper = x_lim_right
  }
  
  curve(dnorm(x, mean = mu, sd = sigma), from = x_lim_left, to = x_lim_right, type = "l", xlab = "t", ylab = "f(x)", add = add, col = col, lwd = 2)
  prob = seq(from = lower, to = upper, by = 0.01)
  polygon(c(lower, prob, upper),
          c(0, dnorm(prob, mu, sigma), 0),
          col = alpha(col, transparency), border = NA)
  abline(h = 0)
}

The hypothesis testing that we are interested in here is for the \(\beta_1\) value, so we first take a look at the estimated \(\beta_1\) (\(\hat{\beta_1}\)) from the simulations. For simplicity, let’s set sample size to be 10 and \(\sigma\) to be 1 here. Under Null hypothesis, \(\beta_1 = 0\), so we first set \(\beta_1\) to be 0 and plot the (\(\hat{\beta_1}\)) values from the simulations. We expect the distribution to be a normal distribution with a mean of true \(\beta_1\), and standard deviation can be calculated to be \(\frac{\sigma}{\sqrt{S_{xx}}}\). So we can overlay the therotical normal distribution on the histograms. We can also calculate the margin (critical value x sd) for \(\alpha = 0.05\), shown by the red vertical lines, meaning that under Null hypothesis, any \(\hat{\beta_1}\) values beyond these lines would be unlikely to be observed.

Similarly, we can also plot the histogram and distribution of \(\hat{\beta_1}\) values under the Alternative hypothesis. Let’s say the true \(\beta_1 = 0.8\). We plot the histogram and therotical normal distribution of the \(\hat{\beta_1}\) values in blue.

# look at distributions of beta_1_hat
par(oma = c(0, 0, 3, 0))

size = 10
sigma = 1
beta_1 = 0
idx_beta_1 = which(abs(betas - beta_1) < 0.01)
x_values = seq(0, 5, length = size)
S_xx = sum((x_values - mean(x_values))^2)
sd_beta_1 = sigma/sqrt(S_xx)

alpha = 0.05
margin = abs(qnorm(alpha/2, mean = beta_1, sd = sd_beta_1))

hist(study3_beta1_hat_list$size_10_sigma_1[idx_beta_1,], probability = TRUE, breaks = 30, xlab = expression(hat(beta[1])), main = "", border = "pink", xlim = c(-1, 2), ylim = c(0, 2.5))
plot_norm_dist(mu = beta_1, sigma = sd_beta_1, add = TRUE)
abline(v = margin, col = "red")
abline(v = -margin, col = "red")


# what's the distribution of beta_1_hat when the alternative model is true?
# let's just fix sample size to be 10 and sigma to be 1 here
size = 10
sigma = 1
# let's say the true beta_1 is 0.8
beta_1 = 0.8
idx_beta_1 = which(abs(betas - beta_1) < 0.01)

hist(study3_beta1_hat_list$size_10_sigma_1[idx_beta_1,], probability = TRUE, breaks = 30, xlab = expression(hat(beta[1])), main = "", border = "lightskyblue", add = TRUE)
plot_norm_dist(mu = beta_1, sigma = sd_beta_1, add = TRUE, col = "dodgerblue")
legend("topright", legend = c(expression(beta[1]~" = 0"), expression(beta[1]~" = 0.8")), text.col=c("red", "dodgerblue"), bty = 'n', cex = 1.5)
mtext(expression('Figure 3 - Comparing Distributions of the ' ~hat(beta[1]) ~' Values under the Null and Alternative Hypotheses'), line = 1, outer = TRUE)

Note that the actual hitograms match pretty well with the theoretical normal distribution curves. Any \(\hat{\beta_1}\) values of the \(\beta_1 = 0.8\) model (the blue ones) on the right of the red line would be very likely to be considered as “significantly different” than the ones under Null hypothesis (the red ones), and thus we can reject the Null hypothesis. We know those are right desicions because we know the true \(\beta_1 \ne 0\). So the proportion of those values would be close to the power.

I hope this set up some intuition to the power. However, we have a problem here, and that was the reason I said “very likely” and “close to the power”. They aren’t exactly the power. The reason is because here, with these \(\hat{\beta_1}\) values, we compare them with the red line, which is calculated from the true SD[\(\hat{\beta_1}\)]. In reality, we wouldn’t have that true SD[\(\hat{\beta_1}\)] value available. So each time we fit a model, we estimate it using SE[\(\hat{\beta_1}\)]. We then calculate a t-value for each simulation by dividing the \(\hat{\beta_1}\) by this estimated SE[\(\hat{\beta_1}\)], and then compare this t value with a critical value. Based on this, we will get a p-value and make a desicion by comparing this p-value and our pre-set \(\alpha\) threshold. Note that the SE[\(\hat{\beta_1}\)] will vary from time to time. To demonstrate this, I plot the histogram of the SE[\(\hat{\beta_1}\)] values (Figure 4). Here we still just stick to sample size = 10 and \(\sigma\) = 1. We can see the SE[\(\hat{\beta_1}\)] values of either the \(\beta_1 = 0\) model (red), \(\beta_1 = 0.4\) model (in green) or \(\beta_1 = 0.8\) model (blue) center at the true SD[\(\hat{\beta_1}\)] value (shown by the purple vertical line) and seem to follow an unclear symmetric distribution.

# look at distributions of SE[beta_1_hat]
par(oma = c(0, 0, 3, 0))

size = 10
sigma = 1
beta_1 = 0
idx_beta_1 = which(abs(betas - beta_1) < 0.01)
x_values = seq(0, 5, length = size)
S_xx = sum((x_values - mean(x_values))^2)
sd_beta_1 = sigma/sqrt(S_xx)

alpha = 0.05
crit = abs(qnorm(alpha/2, mean = beta_1, sd = sd_beta_1))

hist(study3_beta1_se_list$size_10_sigma_1[idx_beta_1,], probability = TRUE, breaks = 30, xlab = expression("SE["~ hat(beta[1])~ "]"),  col = alpha("pink", 0.3), border = "pink", main = "", xlim = c(0, 0.4))

beta_1 = 0.8
idx_beta_1 = which(abs(betas - beta_1) < 0.01)

hist(study3_beta1_se_list$size_10_sigma_1[idx_beta_1,], probability = TRUE, breaks = 30, xlab = expression(SE[hat(beta[1])]), main = expression(beta[1] ~"= 0.8"), col = alpha("lightskyblue", 0.3), border = "lightskyblue", xlim = c(0, 0.4), add = TRUE)

beta_1 = 0.4
idx_beta_1 = which(abs(betas - beta_1) < 0.01)

hist(study3_beta1_se_list$size_10_sigma_1[idx_beta_1,], probability = TRUE, breaks = 30, xlab = expression(SE[hat(beta[1])]), main = expression(beta[1] ~"= 0.8"), col = alpha("lightskyblue", 0.3), border = "green3", xlim = c(0, 0.4), add = TRUE)
abline(v = sd_beta_1, col = "magenta")

legend("topright", legend = c(expression(beta[1]~" = 0"), expression(beta[1]~" = 0.4"), expression(beta[1]~" = 0.8")), text.col=c("red",  "green3", "dodgerblue"), bty = 'n', cex = 1.5)
mtext(expression('Figure 4 - Distributions of the SE[' ~hat(beta[1]) ~'] Values'), line = 1, outer = TRUE)

As mentioned above, to make a statistical desicion for each simulation to test the \(H_0\) of \(\beta_1 = 0\), we calculate a t-value: t = \(\frac{\hat{\beta_1}}{SE[\hat{\beta_1}]}\), instead of \(\frac{\hat{\beta_1}}{SD[\hat{\beta_1}]}\). Each time we calculate this t-value using a specific \(SE[\hat{\beta_1}]\) that we estimate from the simulated data. The question is how and how much the variation of \(SE[\hat{\beta_1}]\) will affect the calculation of t-value and thus affect the statistical desicion and the power of the analysis? It’ll be useful to look at the distribution of the values of 1/\(SE[\hat{\beta_1}]\), or for reasons which will become more clear later, the values of \(\beta_1\)/\(SE[\hat{\beta_1}]\). Below I plot the distribution of these values for the \(\beta_1 = 0.4\) model (in green) and the \(\beta_1 = 0.8\) model (in blue). The vertical lines indicate the corresponding \(\beta_1\)/\(SD[\hat{\beta_1}]\) values.

# look at distributions of beta_1/SE[beta_1_hat]
par(oma = c(0, 0, 3, 0))

size = 10
sigma = 1
beta_1 = 0.8
idx_beta_1 = which(abs(betas - beta_1) < 0.01)
x_values = seq(0, 5, length = size)
S_xx = sum((x_values - mean(x_values))^2)
sd_beta_1 = sigma/sqrt(S_xx)

hist(beta_1/study3_beta1_se_list$size_10_sigma_1[idx_beta_1,], probability = TRUE, breaks = 30, xlab = expression(beta[1] ~"/ SE["~ hat(beta[1])~ "]"), main = "", col = alpha("lightskyblue", 0.3), border = "lightskyblue", xlim = c(-0.5, 12), ylim = c(0, 0.9))
abline(v = beta_1/sd_beta_1, col = "dodgerblue")

beta_1 = 0.4
idx_beta_1 = which(abs(betas - beta_1) < 0.01)

hist(beta_1/study3_beta1_se_list$size_10_sigma_1[idx_beta_1,], probability = TRUE, breaks = 30, xlab = expression(SE[hat(beta[1])]), main = expression(beta[1] ~"= 0.8"), col = alpha("lightskyblue", 0.3), border = "green3", add = TRUE)
abline(v = beta_1/sd_beta_1, col = "green")

legend("topright", legend = c(expression(beta[1]~" = 0.4"), expression(beta[1]~" = 0.8")), text.col=c("green3", "dodgerblue"), bty = 'n', cex = 1.5)
mtext(expression('Figure 5 - Distributions of the '~beta[1] ~'/ SE[' ~hat(beta[1]) ~'] Values'), line = 1, outer = TRUE)

Now, we can see that the ratios vary a lot. In addtion, it looks like these distributions are right-skewed. This is not surrprising because it is proportional with the reciprocal of \(SE[\hat{\beta_1}]\). Given this observation, we know that it’ll be more appropriate to directly look at the t-values of each simulation, and compare that with the t-values under the Null hypothesis.

Under Null hypothesis (\(\beta_1 = 0\)), by definition, the t-value t = \(\frac{\hat{\beta_1}}{SE[\hat{\beta_1}]}\) will follow a t distribution with the degree of freedom df = N-2. Let’s still set sample size = 10 and \(\sigma\) = 1 here. We display the t-values we get from the simulations using a histogram below (Figure 6 left panel) for \(\beta_1 = 0\). It does look like a t-distribution shape. In fact, we can overlay the theoretical t-distribution curve on it and they match very well. Again, the red vertical lines represent the critical values of the distribution for \(\alpha = 0.05\). So we know any t-values beyond these lines (shaded part) will be considered significantly different, meaning that we would reject the Null hypothesis for those t-values.

Now, for a known non-zeon \(\beta_1\), again let’s say \(\beta_1 = 0.8\) (that would be an alternative hypothesis), we can also plot the t-values. As shown in the following figure (right panel), they are not centered at 0 and look a little right-skewed. In fact, we can decompose the t-values for the \(H_1\):

\[ t = \frac{\hat{\beta_1}}{SE[\hat{\beta_1}]} = \frac{\hat{\beta_1} - \beta_1}{SE[\hat{\beta_1}]} + \frac{\beta_1}{SE[\hat{\beta_1}]} \]

We know the first component follow regular a t-distribution because that’s essentially the t-value if the Null hypothesis were \(\beta_1 = \beta_1\). What about the second component? Doesn’t it look familiar? Yes that’s the \(\frac{\beta_1}{SE[\hat{\beta_1}]}\) we just plotted above. So we can imagine adding these two components together will give us some t-distribution-like right-skewed distribution. It turns out it is still a t-distribution, with a non-centrality parameter being \(\frac{\beta_1}{SD[\hat{\beta_1}]}\), which is where the vertical lines located in the above figure. Knowing the theoretical distribution, we could overlay the curve onto the histogram and they match very well (Figure 6 right panel).

# let's use our real data
par(mfcol=c(1,2), oma = c(0, 0, 3, 0))

# left
# what's the distribution of t values when the null model is true?
# let's just fix sample size to be 10 and sigma to be 1 here
size = 10
sigma = 1
beta_1 = 0
idx_beta_1 = which(abs(betas - beta_1) < 0.01)

alpha = 0.05
crit = abs(qt(alpha/2, df = size - 2))

# show hist, curve, and type I error
par(mai = c(0.8, 0.8, 0.4, 0))
hist(study3_t_values_list$size_10_sigma_1[idx_beta_1,], probability = TRUE, breaks = 30, xlab = "t values", main = expression(beta[1] ~"= 0"), border = "pink", xlim = c(-9, 9))
plot_t_dist(df = size - 2, col = "red", add = TRUE, lower = -Inf, upper = -crit)
plot_t_dist(df = size - 2, col = "red", add = TRUE, lower = crit, upper = Inf)
abline(v = crit, col = "red")
abline(v = -crit, col = "red")

# right
# what's the distribution of t values when the alternative model is true?
# let's just fix sample size to be 10 and sigma to be 1 here
size = 10
sigma = 1
# let's say the true beta_1 is 0.8
beta_1 = 0.8
idx_beta_1 = which(abs(betas - beta_1) < 0.01)
x_values = seq(0, 5, length = size)
S_xx = sum((x_values - mean(x_values))^2)
sd_beta_1 = sigma/sqrt(S_xx)
ncp = beta_1/sd_beta_1

# show hist, curve, and type I error
par(mai = c(0.8, 0.8, 0.4, 0))
hist(study3_t_values_list$size_10_sigma_1[idx_beta_1,], probability = TRUE, breaks = 50, xlab = "t values", main = expression("                   "~beta[1] ~"= 0.8"), border = "lightskyblue", xlim = c(-9, 9), ylim = c(0, 0.4), ylab = "")
plot_t_dist(df = size - 2, ncp = ncp, col = "dodgerblue", add = TRUE)

mtext('Figure 6 - Comparing Distributions of t-Values under the Null and Alternative Hypotheses', line = 1, outer = TRUE)

Next, we can overlay the t-value distributions of the Null hypothesis together with the Alternative hypothesis. As shown below in Figure 7 (topleft panel), the shade area in red represents the \(P < \alpha\) part according to the Null hypothesis t distribution, which is the Type I error. Applying the threshold to the \(H_1\) t-values, we can see the portion on the left of the red line (the shade area in blue), would be incorrectly considered as “Fail to Reject \(H_0\)”, so that’s the Type II error \(\beta\). On the other hand, as shown in the bottomleft panel, the portion on the right of the red line (the shade area in blue) represents power (1 - \(\beta\)).

# Now let's plot the null hypothesis at the same time and figure out what the power is

size = 10
sigma = 1
beta_1 = 0.8
# index of beta_1 in the betas vector
idx_beta_1 = which(abs(betas - beta_1) < 0.01)

alpha = 0.05
crit = abs(qt(alpha/2, df = size - 2))

par(oma = c(0, 0, 3, 0))
layout(matrix(c(1, 2, 3, 3), nrow = 2, ncol = 2))

# type I error and type II error
par(mai = c(0.6, 0.8, 0.2, 0))
hist(study3_t_values_list$size_10_sigma_1[idx_beta_1,], probability = TRUE, breaks = 50, xlab = "", main = "", border = "lightskyblue", xlim = c(-8, 8), ylim = c(0, 0.4))
plot_t_dist(df = size - 2, ncp = ncp, col = "dodgerblue", lower = -Inf, upper = crit, add = TRUE)
plot_t_dist(df = size - 2, col = "red", add = TRUE, lower = -Inf, upper = -crit)
plot_t_dist(df = size - 2, col = "red", add = TRUE, lower = crit, upper = Inf)
abline(v = crit, col = "red")
legend("topright", legend = c(expression(beta[1]~" = 0"), expression(beta[1]~" = 0.8")), text.col=c("red", "dodgerblue"), bty = 'n', cex = 1.5)

# show power
par(mai = c(0.8, 0.8, 0, 0))
hist(study3_t_values_list$size_10_sigma_1[idx_beta_1,], probability = TRUE, breaks = 50, xlab = "t values", main = "", border = "lightskyblue", xlim = c(-8, 8), ylim = c(0, 0.4))
plot_t_dist(df = size - 2, ncp = ncp, col = "dodgerblue", lower = crit, upper = Inf, add = TRUE)
plot_t_dist(df = size - 2, col = "red", add = TRUE)
abline(v = crit, col = "red")

# this power value corresponds with one single point on the power curve:
par(mai = c(0.8, 0.8, 0.8, 0))
plot(betas, study3_powers$size_10_sigma_1, type = "l", main = expression('N = 10, ' ~sigma ~ '= 1'),
     xlab = expression(beta[1]), ylab = "Power", lty = 2, lwd = 1.5, ylim = c(0, 1))
points(x = betas[idx_beta_1], y = study3_powers$size_10_sigma_1[idx_beta_1], pch = 19, col = 'dodgerblue')
abline(v = beta_1, col = "lightskyblue")
abline(h = study3_powers$size_10_sigma_1[idx_beta_1], col = "lightskyblue")

mtext('Figure 7 - Illustration of Power from Simulated t-Values', line = 1, outer = TRUE)

This shade proportion value is consistent with the actual value that we get by directly calculating the proportion of p-value > \(\alpha\). Shown in Figure 7 right panel, it is represented by one dot on the power curve of N = 10 and \(\sigma\) = 1 at the position \(\beta_1 = 0.8\).

4. Further exploration of effects on power

Next, we will explore how exactly each parameter affects the statistical power in this setting. First of all, for signal strength, we still stick with sample size of 10 and \(\sigma\) of 1 here. We can compare two different \(\beta_1\) values: 0.4 and 0.8. We plot their histogram, theoretical distribution of the t-values, and compare that with the Null hypothesis distribution (Figure 8). The shade area of each curve indicates power. At the same time, on the right panel I display the corresponding points on the power curve of this specific set of parameters.

# How does beta_1 affect power?
# set up parameters
size = 10
sigma = 1
beta_1_small = 0.4
beta_1_large = 0.8
x_values = seq(0, 5, length = size)
S_xx = sum((x_values - mean(x_values))^2)
sd_beta_1 = sigma/sqrt(S_xx)
ncp_small = beta_1_small/sd_beta_1
ncp_large = beta_1_large/sd_beta_1
alpha = 0.05
crit = abs(qt(alpha/2,df = size - 2))

# get the t values from the simulation
idx_beta_1_small = which(abs(betas - beta_1_small) < 0.01)
idx_beta_1_large = which(abs(betas - beta_1_large) < 0.01)
t_beta_1_small = study3_t_values_list$size_10_sigma_1[idx_beta_1_small,]
t_beta_1_large = study3_t_values_list$size_10_sigma_1[idx_beta_1_large,]

par(oma = c(0, 0, 3, 0))
layout(matrix(c(1, 1, 2), nrow = 1, ncol = 3, byrow = TRUE))

# left: show t value distributions
hist(t_beta_1_small, probability = TRUE, breaks = 30, xlab = "t values", main = "", border = "lightgreen", xlim = c(-5, 9), ylim = c(0, 0.4))
plot_t_dist(df = size - 2, ncp = ncp_small, col = "green3", lower = crit, upper = Inf, add = TRUE)
hist(t_beta_1_large, probability = TRUE, breaks = 50, border = "lightskyblue", xlim = c(-5, 9), ylim = c(0, 0.4), add = TRUE)
plot_t_dist(df = size - 2, ncp = ncp_large, col = "dodgerblue", lower = crit, upper = Inf, add = TRUE)
plot_t_dist(df = size - 2, col = "red", add = TRUE)
abline(v = crit, col = "red")
legend("topright", legend = c(expression(beta[1]~" = 0.4"), expression(beta[1]~" = 0.8")), text.col=c("green3", "dodgerblue"), bty = 'n', cex = 1.5)

# right: show power curves
plot(betas, study3_powers$size_10_sigma_1, type = "l", main = expression('N = 10, ' ~sigma ~ '= 1'),
     xlab = expression(beta[1]), ylab = "Power", lty = 2, lwd = 1.5, ylim = c(0, 1))
points(x = betas[idx_beta_1_small], y = study3_powers$size_10_sigma_1[idx_beta_1_small], pch = 19, col = 'green')
abline(v = beta_1_small, col = "green")
points(x = betas[idx_beta_1_large], y = study3_powers$size_10_sigma_1[idx_beta_1_large], pch = 19, col = 'dodgerblue')
abline(v = beta_1_large, col = "dodgerblue")

mtext('Figure 8 - Influence of Signal Strength on Statistical Power', line = 1, outer = TRUE)

Next, to observe the influence of sample size on power, we will set \(\beta_1\) to be 1 and \(\sigma\) to be 2 this time. We compare two different sample sizes: 10 and 20. Again, we plot their histograms and theoretical distributions of the t-values, and compare with the Null hypothesis distribution (Figure 9). Note that we have two different Null hypothesis curves because the degree of freedom of the Null hypothesis t-distribution is affected by sample size. Also, the critical value is also slightly affeted. The shade area of each curve indicates power. On the right panel, I display the corresponding points on the power curve of this specific set of parameters.

# How does sample size affect power?
# set up parameters
size_small = 10
size_large = 20
sigma = 2
beta_1 = 1

x_values_small_size = seq(0, 5, length = size_small)
S_xx_small_size = sum((x_values_small_size - mean(x_values_small_size))^2)
sd_beta_1_small_size = sigma/sqrt(S_xx_small_size)
ncp_small_size = beta_1/sd_beta_1_small_size

x_values_large_size = seq(0, 5, length = size_large)
S_xx_large_size = sum((x_values_large_size - mean(x_values_large_size))^2)
sd_beta_1_large_size = sigma/sqrt(S_xx_large_size)
ncp_large_size = beta_1/sd_beta_1_large_size

alpha = 0.05
crit_small_size = abs(qt(alpha/2,df = size_small - 2))
crit_large_size = abs(qt(alpha/2,df = size_large - 2))

# get the t values from the simulation
idx_beta_1 = which(abs(betas - beta_1) < 0.01)
t_beta_1_small_size = study3_t_values_list$size_10_sigma_2[idx_beta_1,]
t_beta_1_large_size = study3_t_values_list$size_20_sigma_2[idx_beta_1,]

par(oma = c(0, 0, 3, 0))
layout(matrix(c(1, 1, 2), nrow = 1, ncol = 3, byrow = TRUE))

# left: show t value distributions
hist(t_beta_1_small_size, probability = TRUE, breaks = 30, xlab = "t values", main = "", border = "lightgreen", xlim = c(-5, 8), ylim = c(0, 0.4))
plot_t_dist(df = size_small - 2, ncp = ncp_small_size, col = "green3", lower = crit_small_size, upper = Inf, add = TRUE)
hist(t_beta_1_large_size, probability = TRUE, breaks = 30, border = "lightskyblue", xlim = c(-5, 8), ylim = c(0, 0.4), add = TRUE)
plot_t_dist(df = size_large - 2, ncp = ncp_large_size, col = "dodgerblue", lower = crit_large_size, upper = Inf, add = TRUE)
plot_t_dist(df = size_small - 2, col = "red", add = TRUE)
plot_t_dist(df = size_large - 2, col = "peru", add = TRUE)
abline(v = crit_small_size, col = "red")
abline(v = crit_large_size, col = "peru")
legend("topright", legend=c('N = 10','N = 20'), text.col=c("green3", "dodgerblue"), bty = 'n', cex = 1.5)

# right: show power curves
plot(betas, study3_powers$size_10_sigma_2, type = "l", main = expression(sigma ~ '= 2'),
     xlab = expression(beta[1]), ylab = "Power", lty = 2, lwd = 1.5, ylim = c(0, 1), col = 'green')
lines(betas, study3_powers$size_20_sigma_2, type = "l", 
     xlab = expression(beta[1]), ylab = "Power", lty = 2, lwd = 1.5, ylim = c(0, 1), col = 'dodgerblue')
points(x = betas[idx_beta_1], y = study3_powers$size_10_sigma_2[idx_beta_1], pch = 19, col = 'green')
points(x = betas[idx_beta_1], y = study3_powers$size_20_sigma_2[idx_beta_1], pch = 19, col = 'dodgerblue')
abline(v = beta_1, col = "grey")

mtext('Figure 9 - Influence of Sample Size on Statistical Power', line = 1, outer = TRUE)

Finally, let’s explore how the standard deviation parameter \(\sigma\) affects power. This time, we set \(\beta_1\) to be 1 and sample size to be 30, and compare two different \(\sigma\) values: 2 and 4. Similarly, we plot their histograms and theoretical distributions of the t-values, and compare with the Null hypothesis distribution (Figure 10). The shade area of each curve indicates power. Again, on the right panel, I display the corresponding points on the power curve of this specific set of parameters.

# How does sigma affect power?
# set up parameters
size = 30
sigma_small = 2
sigma_large = 4
beta_1 = 1

x_values = seq(0, 5, length = size)
S_xx = sum((x_values - mean(x_values))^2)
sd_beta_1_small_sigma = sigma_small/sqrt(S_xx)
ncp_small_sigma = beta_1/sd_beta_1_small_sigma
sd_beta_1_large_sigma = sigma_large/sqrt(S_xx)
ncp_large_sigma = beta_1/sd_beta_1_large_sigma

alpha = 0.05
crit = abs(qt(alpha/2,df = size - 2))

# get the t values from the simulation
idx_beta_1 = which(abs(betas - beta_1) < 0.01)
t_beta_1_small_sigma = study3_t_values_list$size_30_sigma_2[idx_beta_1,]
t_beta_1_large_sigma = study3_t_values_list$size_30_sigma_4[idx_beta_1,]

par(oma = c(0, 0, 3, 0))
layout(matrix(c(1, 1, 2), nrow = 1, ncol = 3, byrow = TRUE))

# left: show t value distributions
hist(t_beta_1_small_sigma, probability = TRUE, breaks = 30, xlab = "t values", main = "", border = "lightskyblue", xlim = c(-5, 8), ylim = c(0, 0.4))
plot_t_dist(df = size - 2, ncp = ncp_small_sigma, col = "dodgerblue", lower = crit, upper = Inf, add = TRUE)
hist(t_beta_1_large_sigma, probability = TRUE, breaks = 30, border = "lightgreen", xlim = c(-5, 8), ylim = c(0, 0.4), add = TRUE)
plot_t_dist(df = size - 2, ncp = ncp_large_sigma, col = "green3", lower = crit, upper = Inf, add = TRUE)
plot_t_dist(df = size - 2, col = "red", add = TRUE)
abline(v = crit, col = "red")
legend("topright", legend = c(expression(sigma~" = 2"), expression(sigma~" = 4")), text.col=c("dodgerblue", "green3"), bty = 'n', cex = 1.5)

# right: show power curves
plot(betas, study3_powers$size_30_sigma_2, type = "l", main = "N = 30",
     xlab = expression(beta[1]), ylab = "Power", lty = 2, lwd = 1.5, ylim = c(0, 1), col = 'dodgerblue')
lines(betas, study3_powers$size_30_sigma_4, type = "l", 
     xlab = expression(beta[1]), ylab = "Power", lty = 2, lwd = 1.5, ylim = c(0, 1), col = 'green')
points(x = betas[idx_beta_1], y = study3_powers$size_30_sigma_2[idx_beta_1], pch = 19, col = 'dodgerblue')
points(x = betas[idx_beta_1], y = study3_powers$size_30_sigma_4[idx_beta_1], pch = 19, col = 'green')
abline(v = beta_1, col = "grey")

mtext('Figure 10 - Influence of Noise Level on Statistical Power', line = 1, outer = TRUE)

Discussion

1. How do signal strength \(\beta_1\), sample size N, and \(\sigma\) affect power?

As shown in Figure 1, we can see that all of these curves show the same pattern: they are symmetric by \(\beta_1 = 0\), and as the absolute value of \(\beta_1\) increase, the power gradually increase from very low (under 0.2). At lower \(\sigma\) values, they at the end can reach 1 or approach 1. At higher \(\sigma\), the trend is the same, just much larger \(\beta_1\) values are needed for power to approach 1. As \(\sigma\) increases, power decreases. Within each panel, we can see clearly, larger sample size leads to bigger power. In conclusion, larger sample size, stronger signal, and smaller \(\sigma\) determine higher statistical power.

2. How to understand these effects?

It’s helpful to utilize t-value distribution under Null and Alternative hypothesis to understand how these parameters affect power. As demonstrated in Figure 8, \(\beta_1\) (we only use positive values as example here) increases, the non-centrality parameter (ncp) of the t-values of the Alternative hypothesis will also increase. Recall that it’s calculated by: ncp = \(\beta_1\)/\(SD[\hat{\beta_1}]\). \(SD[\hat{\beta_1}] = \sigma/S_{xx}\) will stay as a constant value. So the curve will shift toward the right, further away from the Null hypothesis t-distribution. Thus the proportion of the shade area on the right of the critical value will also increase.

As sample size increases, shown in Figure 9, the shape of the Null hypothesis t-distribution will be taller and narrower (closer to normal distribution), leading to a smaller critical value. At the same time, because \(S_{xx}\) increases, \(SD[\hat{\beta_1}]\) decreases, causing ncp to increase, shifting the Alternative hypothesis t-distribution to the right. Together, these effects lead to higher power.

Similarly, when \(\sigma\) decreases, it decrease \(SD[\hat{\beta_1}]\), and thus increase ncp, shift the Alternative hypothesis t-distribution to the right, and increases power (Figure 10).

One can also think about the distributions of \(\hat{\beta_1}\) under Null and Alternative hypotheses as shown in Figure 3. Larger signal will shift the \(H_1\) curve away from the \(H_0\) curve, large sample size or smaller \(\sigma\) will both decrease \(SD[\hat{\beta_1}]\) and cause the curves to be narrower and taller. But as demonstrated in the Results section, there is a problem with it and the t-value distribution is more direct and appropriate.

3. Are \(1000\) simulations sufficient?

From Figure 1, most of the curves are pretty smooth, indicating overall the simulations are sufficient enough. However, there are also some jagged parts, especially for situations of smaller sample size and larger \(\sigma\). I further took a look at the data extracted from simulated models. As shown in Figure 2, overall, the empirical mean of \(\hat{\beta_1}\) get closer and closer to the expected value as we continue to simulate in most cases. Only in smaller sample size and larger \(\sigma\) situation (for example N = 10 and \(\sigma\) = 4) the values seem to be not stable yet, but not too bad. So overall, 1000 simulations seem to be sufficient for the purpose of this study, although the simulation number could be further increased for smaller sample size and larger \(\sigma\) situations.