Power_Analysis_micro_goal

Author

Fred

Power Analysis and Assumptions:

This is a 3*2*2*2*2 between group factorial experiment, some references I have so far been reading:

References:

I highly recommend for people who haven’t done a lot of power analysis to go through some of the references I have below, they were really helpful for me to understand the whole picture and come up with the idea of doing it manually instead of a black-box on G*Power

  1. How to unify a general sample size instead of per-cell sample size:
    1. http://jakewestfall.org/blog/index.php/2015/05/26/think-about-total-n-not-n-per-cell/
  2. Sample size and p-hacking issues, what might arise and how to solve them:
    1. http://datacolada.org/17
  3. How to keep account of interaction effect, in a factorial design:
    1. https://approachingblog.wordpress.com/2018/01/24/powering-your-interaction-2/
  4. Test case on a 2*2 factorial experiment power analysis run by R: A lot of coding was adapted from here and changed based on our choice
    1. https://www.markhw.com/blog/power-twoway
  5. understanding factorial design and statistical analysis:
    1. https://www.youtube.com/watch?v=uaWEQj18zqI
  6. How to choose an ambiguous cohen-d value:
    1. https://www.spss-tutorials.com/cohens-d/

    2. https://www.youtube.com/watch?v=GDe4M0xEghs

  7. There’s many more stackoverflow pages for me to look into to solve bugs but I don’t keep account of those anymore
  8. Simulating a factorial design experiment:
    1. https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-019-0883-9

    2. https://debruine.github.io/faux/

    3. https://stats.stackexchange.com/questions/57642/simulating-responses-from-a-factorial-experiment-for-power-analysis

    4. https://rdrr.io/cran/faux/man/sim_design.html

  9. Thanks to Norm’s coding I realized where this code could be further refurbished:
    1. that is to look at directly the change of scores,
    2. as well as to model the with a personalized intercept
    3. relax the assumption on score change further more
    4. adjust the alpha and power constraints… I am still not very sure if this is the way to do, as Norm suggested an adjustment on FWER (family wise error rate), but in my opinion we are only looking at whether or not one factor is significant? or do we want to look at all of them?
    5. taken into account that change is suppose to be a discrete number :D Thank you Norm for pointing it out :D

let’s give this a closer look:

Coding Step by Step:

Step 1: Simulation Function:

Let’s figure out in this step, how to simulate such data using three settings:

  1. low effect size, with a Cohen’s D = 0.1,
    1. let me give a concrete example here, I will model this as of percentage change in the total score possible: that is
      1. suppose we have a potential of 25 points as in pre and post survey question (1 to 5 for all questions), that is
      2. the absolute change is within range 0 to 20 (-20 to 20)
      3. then I will apply a 0.2 to this change that is 20*0.2 = 4 as the mean score for the main change
      4. to cover the whole duration of -20 to 20 points, I will set a standard deviation of 5% of total score (5% * 20 = 1) as in the normal scale.
      5. people can revisit this assumption part further more
      6. I would like to take something as Norm suggested as a main assumption for the score calculation, but I think that is making things more complicated, but let me put the main assumption function here that Norm suggested, if I get further understanding of how you will assume each factor will perform, I can also incorporate this into the model for a further clarification
        1. main assumption Norm suggested: mainassumptions = “Commitment0 + rfx1 + 1*(Activity == 3) + 1*(Barrier == 2) + 1*(Examples == 2) + 1*(BiggerGoal == 2) + 1*(Expectations == 2)”
  2. medium effect size, with a Cohen’s D = 0.3
  3. Large effect size, with a Cohen’s D = 0.5

Let me reflect this newly added simulation scale below:

library(lme4)
Loading required package: Matrix
## ReadME

# let's specify potential absolute score change first:

abs_score_change <- 20 # please specify it here Ananya :D, as mentioned in 1.3
standard_deviation_scale <- 0.05 # as mentioned above in 1.4

# Let's first write the function of getting mode:
Modes <- function(x) {
  ux <- unique(x)
  tab <- tabulate(match(x, ux))
  ux[tab == max(tab)]
}

# note that if we are including interaction, by default it is FALSE to not include interaction
# effect between factors, set to TRUE manually if we are testing for 
sim_data <- function(n, eff_size, alpha, interaction = FALSE) {
  s <- eff_size/2 * abs_score_change # how much each data is fluctuating around 0
  f1 <- factor(sample(c("CA", "RA", "MA"), n, TRUE)) # Activity
  f2 <- factor(sample(c("OER", "DDR"), n, TRUE)) # Barrier
  f3 <- factor(sample(c("Text", "Voice"), n, TRUE)) # Examples
  f4 <- factor(sample(c("Open", "Guided"), n, TRUE)) # BiggerGoal
  f5 <- factor(sample(c("NEX", "EX"), n, TRUE)) # Expectations
  
  # prior generations at the level of eff_size, or in other words, standardized cohen's D 
  # distance,
  mu_1 <- ifelse(f1 == 'CA', -1*s, ifelse(f1 == 'RA', 0, s))
  mu_2 <- ifelse(f2 == 'OER', -1*s, s)
  mu_3 <- ifelse(f3 == 'Text', -1*s, s)
  mu_4 <- ifelse(f4 == 'Open', -1*s, s)
  mu_5 <- ifelse(f5 == 'NEX', -1*s, s)
  
  mu <- cbind(mu_1, mu_2, mu_3, mu_4, mu_5)
  
  # Now let us simulate the dat:
  dv <- c()
  for (i in 1:n){
    dv[i] <- rnorm(1, 
                   mean(Modes(mu[i,]))+rnorm(1,0,1^2), # added a random noise here var = 1, 
                   abs_score_change*standard_deviation_scale)
    # need to incorporate the fact that t
    # assume a standardeviation of 1 to follow the setup of  cohen's D
    # People can change the mean of mode function to mode by writing their own, or median
  }
  dv <- round(dv) # lol probably the worse way of writing code XD
  # so far for model checking, I only check the case when some factor levels are significant, but see below by cases for a safer option (of which requires significant more sample size)
  if (interaction == FALSE){
    model <- lm(dv ~ f1 + f2 + f3 + f4 + f5) 
    # coef[2,4] and coef[3,4] looks at the p-value of factor 1,
    # as it is explored the least amount, we want to see if any level is significant
  }
  else{
    model <- lm(dv ~ f1 * f2 * f3 * f4 * f5)}
    # row 38:48 are all the five level interactions item,
    # it is safe to show those are significant values 
    # (Fred made the assumption here, double check, what is your model design?) # no response yet, assuming I did it correctly?
  # applying bonferroni correction
  p_values <- summary(model)$coef[-1, 4]
  bonferroni_corrected_p_values <- p.adjust(p_values, method = "bonferroni")
  return(min(bonferroni_corrected_p_values) < alpha)
    }

P.S. I have tested multiple cases (around ~15), and I think the model converges under all the test cases I run, but I am not CS majored and I hate writing test cases, please let me know if there are things that it doesn’t work out :D

Step 2: Get-Power Function:

Now we write the function to calculate the power for such simulated model:

# it takes the mean value for the number of times the model is signifcant in the safest way when a rep number of data we simulated
get_power <- function(n, eff_size, reps, alpha, interaction = FALSE) {
  mean(
    sapply(1:reps, function(placeholder) {
      sim_data(n, eff_size, alpha, interaction)
    })
  )
}

Step 3: Step-wise Sample Size Power analysis function:

Now we want to extend this to a range of sample size settings, for example we start from sample size 100, end at 400, and each time update by 25, something like this, so that it is easier for people to run the code…instead of hitting their head and solving everything :D

power_analysis <- function(eff_size, reps, start, end, by, alpha, interaction = FALSE) {
  set.seed(20552034) # oh I have good ways finding seed, see if you can crack the mistery, again a new interesting one :D
  out <- lapply(
    seq(start, end, by), 
    get_power, 
    eff_size, reps, alpha, interaction
  )
  out <- as.data.frame(do.call(rbind, out))
  names(out) <- "Interaction Term Power"
  out$`Sample Size` <- seq(start, end, by)
  return(
    out[, c(2, 1)]
  )
}

Result and Conclusion for no interaction effect

I will formulate this section into 3 sections, as described before, I used a rep number of 200, and I think my computer is already dying running the simulation…for whoever have a better computer, try increase the rep number larger, although 200 rep is quite good enough

Results: Small Effect Size: 0.1

power_analysis(eff_size = 0.1, 
               reps = 400,
               start = 50,
               end = 150,
               by = 5,
               alpha = 0.05, # this needs to be adjusted to a family wise comparison
                # compare hypothesis testing step by step
               interaction = FALSE)
   Sample Size Interaction Term Power
1           50                 0.5025
2           55                 0.6425
3           60                 0.5950
4           65                 0.6400
5           70                 0.6925
6           75                 0.7275
7           80                 0.7800
8           85                 0.8125
9           90                 0.8375
10          95                 0.8500
11         100                 0.8650
12         105                 0.9150
13         110                 0.9050
14         115                 0.9475
15         120                 0.9475
16         125                 0.9625
17         130                 0.9600
18         135                 0.9550
19         140                 0.9775
20         145                 0.9800
21         150                 0.9875

So for a small effect size 0.1*20, we will need around 85 participants

Results: Meium Effect Size: 0.2

power_analysis(eff_size = 0.2, 
               reps = 200,
               start = 30,
               end = 100,
               by = 5,
               alpha = 0.05,
               interaction = FALSE)
   Sample Size Interaction Term Power
1           30                  0.635
2           35                  0.805
3           40                  0.895
4           45                  0.935
5           50                  0.980
6           55                  0.985
7           60                  0.995
8           65                  0.990
9           70                  1.000
10          75                  1.000
11          80                  1.000
12          85                  1.000
13          90                  1.000
14          95                  1.000
15         100                  1.000

so for a medium effect size 0.2*20, we will need around 35 participants

Results: Large Effect Size: 0.3

power_analysis(eff_size = 0.3, 
               reps = 200,
               start = 10,
               end = 50,
               by = 5,
               alpha = 0.05,
               interaction = FALSE)
  Sample Size Interaction Term Power
1          10                  0.135
2          15                  0.320
3          20                  0.580
4          25                  0.770
5          30                  0.915
6          35                  0.955
7          40                  0.990
8          45                  0.995
9          50                  1.000

So for a large effect size 0.3*20, we will need around 30 participants

Now:

Result and Conclusion for with interaction effect

Results: Small Effect Size: 0.1

power_analysis(eff_size = 0.1, 
               reps = 400,
               start = 300,
               end = 400,
               by = 25,
               alpha = 0.05,
               interaction = TRUE)
  Sample Size Interaction Term Power
1         300                 0.6750
2         325                 0.7775
3         350                 0.8125
4         375                 0.8525
5         400                 0.9000

so we will need about 350 data here under a small effect size 0.1*20

Results: Medium Effect Size: 0.2

power_analysis(eff_size = 0.2, 
               reps = 400,
               start = 100,
               end = 300,
               by = 25,
               alpha = 0.05,
               interaction = TRUE)
  Sample Size Interaction Term Power
1         100                 0.6650
2         125                 0.8225
3         150                 0.9325
4         175                 0.9600
5         200                 0.9725
6         225                 0.9975
7         250                 0.9950
8         275                 1.0000
9         300                 1.0000

So we will need about 125 data point under a medium effect size of 0.2*20

Results: Medium Effect Size: 0.3

power_analysis(eff_size = 0.3, 
               reps = 400,
               start = 50,
               end = 150,
               by = 25,
               alpha = 0.05,
               interaction = TRUE)
  Sample Size Interaction Term Power
1          50                 0.5875
2          75                 0.8450
3         100                 0.9400
4         125                 0.9975
5         150                 0.9950

so we will need about 75 samples for a large effect size of 0.3*20

Summary

Now to summary all of these, let’s create a chart

Below are results for the model lm ~ factor1 + factor2 + factor3 + factor4 + factor5:

Optimal Sample Size under Different Effect Size
Effect Size (or standardized Cohen’s Distance) Optimal Sample Size Power expected
Small Effect size = 0.1*20 85 81.25%
Medium Effect size = 0.2*20 35 80.5%
Large Effect Size = 0.3*20 30 91.5%

Below are results for the model lm ~ factor1 * factor2 * factor3 * factor4 * factor5:

Effect Size (or standardized Cohen’s Distance) Optimal Sample Size Power expected
Small Effect size = 0.1*20 350 81.25%
Medium Effect size = 0.2*20 125 82.25%
Large Effect Size = 0.3*20 75 84.50%