BIO2POS Lecture Topic 4A

class: middle
background-image: url(data:image/png;base64,#LTU_logo_clear.jpg)
background-position: top left
background-size: 25%

# BIO2POS 
# Statistical Power
## Data Analysis Topic 4A
### La Trobe University

---

# Welcome!

### In this lecture we will introduce the concept of statistical power, and discuss how to guarantee a high power for your chosen statistical test

Over the following slides, we will cover:

* .orangered_style[Statistical Power]

* Definition
--

* Factors that influence power

---

# Intended Learning Objectives

### By the end of this lecture you will:

* be able to demonstrate a working knowledge of statistical power

* understand and be able to .seagreen_style[explain the factors] that influence statistical power, and the effect of adjusting these factors
  
--

A solid knowledge of statistical power will be beneficial for any experiment design you conduct.

---

# Type I and Type II Errors

Recall that we previously introduced the concept of Type I and Type II errors in the [DA Topic 2A lecture](https://rpubs.com/LTU_BIO2POS/DA2A). A short recap is provided below:

A .orangered_style[Type I Error (aka a False Positive)] occurs when the information from our sample data leads us to reject `\(H_0\)`, when in reality `\(H_0\)` is actually true.

* We have **falsely** found a **positive** result (something supposedly significant, but which in fact is not significant)

* Our specified .orangered_style[level of significance] `\(\alpha\)` denotes our accepted level of risk of incurring a Type I error - this is why we choose `\(\alpha\)` to be small
  
--

A .orangered_style[Type II Error (aka a False Negative)] occurs when the information from our sample data leads us to not reject `\(H_0\)`, when in reality `\(H_0\)` is actually false.

* We use the symbol `\(\beta\)` to denote our rate of Type II error

---

# Type II Errors and Power

**Formal Definition:** .orangered_style[Statistical Power] `\((1 - \beta)\)` is the probability of correctly rejecting the null hypothesis of a statistical test.

**Informal Definition:** power is the probability of observing a statistically significant result, when it is truly there to be found.

---

# Statistical Power

The more powerful a test, the more worthwhile it is.

* We need our tests to have high power, otherwise we lose faith in the .seagreen_style[validity of our inferences]
  
    * E.g. if our test only has power of 0.5, are our results actually useful?
  
--

Typically, we aim for a power level of **at least 0.8**.

* We can interpret this as meaning that we have an 80% chance of correctly identifying a statistically significant result
  
--

There is always a .orangered_style[trade-off] involved in increasing the power of a test.

Therefore, when determining the appropriate power level, we need to carefully consider all the influential factors.

---

# Starting Point

It will be helpful to use some visualisations as we introduce the different factors that influence statistical power.

For context, we will consider a one sample `\(t\)`-test scenario, with a level of significance `\(\alpha = 0.05\)` (*the same concepts can be applied to more advanced tests too*).

Recall that for a one sample `\(t\)`-test, we have a test statistic `\(T\)`, which we assume follows a `\(t\)`-distribution with `\(n-1\)` degrees of freedom.

* This distribution is symmetric and centred at `\(t=0\)`
  
--

* Small observed test statistics suggest `\(H_0\)` cannot be rejected
    
--

* Large observed test statistics suggest we should reject `\(H_0\)`

---

.left-column[

The black curve denotes the distribution of the test statistic under `\(H_0\)`.

]

.right-column[

]

---

.left-column[

The cross-hatched sections denote our rejection regions - if our observed test statistic lies in one of these regions, we reject `\(H_0\)`.

]

.right-column[

]

---

.left-column[

The green curve denotes a potential alternate distribution under `\(H_1\)`.

Let us assume `\(H_1\)` is true.

]

.right-column[

]

---

.left-column[

Let us assume `\(H_1\)` is true.

For observed test statistics in the rejection regions, we reject `\(H_0\)`.

The probability of making this correct choice is `\(1-\beta\)`, i.e. our **power**, denoted by the shaded green area.

We want to **maximise this area** (i.e. maximise our power).

]

.right-column[

]

---

.left-column[

Let us assume `\(H_1\)` is true.

For observed test statistics not in the rejection regions, we fail to reject `\(H_0\)`.

The probability of making this mistake is our probability of **Type II error**, denoted by the shaded orange area.

We want to **minimise** this area (i.e. minimise our probability of Type II error).
]

.right-column[

]

---
# Factors that influence Power

There are five main factors that influence statistical power:

1. The chosen .orangered_style[level of significance] `\(\alpha\)`
  
--

2. The .orangered_style[predicted mean difference]
  
--

3. The choice of test: .orangered_style[Directional] (one-sided) or .orangered_style[Non-Directional] (two-sided)
  
--

4. The .orangered_style[population standard deviation]
  
--

5. The .orangered_style[sample size]
  
--

We will look at each of these in turn now, changing one factor at a time.

---

# 1. Significance Level

As we change our level of significance `\(\alpha\)`, so too will our power change.

* A change in `\(\alpha\)` means we shift our cut-off points, changing the width of our rejection regions.
  
--

**Result:** If we increase `\(\alpha\)`, we increase our power.

* .orangered_style[Cost: Type I Error will increase]

If we decrease `\(\alpha\)`, we reduce our power.

---
  
#### Example with `\(\alpha = 0.05\)`

---

#### Example with `\(\alpha = 0.10\)` (notice the larger shaded green area)

---

#### Example with `\(\alpha = 0.01\)` (notice the smaller shaded green area)

---

# 2. Predicted Mean Difference

Consider the predicted means of the `\(H_0\)` and `\(H_1\)` distributions for the test statistic.

* The mean under `\(H_0\)` will be 0
  
  * The mean under `\(H_1\)` will be predicted based on various details (pilot studies, expert knowledge, etc)
  
--

The larger the difference in these predicted means, the less the two distributions will overlap.

* E.g., as we increase the predicted mean difference, we effectively begin to separate the two curves
  
--

**Result:** If we increase our predicted mean difference, we increase our power.

* .orangered_style[Cost: If we overestimate the mean difference, this could be unrealistic, and set the study up for failure]

* We would need to determine an .seagreen_style[appropriate effect size] as part of our considerations here.

---

#### Example with predicted mean difference of 2

---

#### Example with predicted mean difference of 1

---

#### Example with predicted mean difference of 4

---

#### Example with predicted mean difference of 10 - is this realistic?

---

# 3. Test Specification

Our choice of .seagreen_style[directional] or .seagreen_style[non-directional test] impacts the power of our test.

If we choose a directional test (aka one-sided or one-tailed test), our level of significance `\(\alpha\)` remains the same, but we now have just **one rejection region**, with area `\(\alpha\)`, instead of two.

**Result:** If we switch from a non-directional test to a directional test, we increase our power.

* .orangered_style[Cost: Our test is now biased in terms of the predicted direction of results]

---

#### Example Non-Directional Test (note the two rejection regions)

---

#### Example Directional Test (note the one, larger rejection region)

---

#### Power Gain from switching to Directional Test shown in blue

---

# 4. Population Standard Deviation

Our sample data comes from a population of interest, with a population mean `\(\mu\)` and population standard deviation `\(\sigma\)`.

The smaller the population standard deviation `\(\sigma\)`, the less overlap we observe between the `\(H_0\)` and `\(H_1\)` distributions.

**Result:** A smaller population standard deviation leads to a larger power.

* .orangered_style[Downside: We cannot directly control the population standard deviation]

* We can however be aware that more diverse populations will provide more challenges with respect to statistical power calculations

---

#### Example with small population standard deviation

---

#### Example with large population standard deviation

---

# 5. Sample Size

Generally speaking, it is beneficial to have more data rather than less data.

**Result:** An increase in our sample size `\(n\)` leads to an increase in our power.

* .orangered_style[Cost: Collecting more data is often costly (time, money)]

* .orangered_style[Warning: Higher power does not necessarily mean greater clinical significance]
  
---

# Experimental Design Notes

Now we know the key factors involved in determining statistical power, we can more easily identify experimental design issues.

For example, consider a study where:

* The level of significance is very small

* The test is non-directional

* The predicted mean difference is very large

* The population has a high standard deviation

* The sample size is very small
  
--

There may be legitimate reasons for these specifications, but objectively, this study may have a poor statistical power.

---

# Experimental Design Notes

To obtain a study which may have a higher power and/or be more viable or realistic, we could make one or more of the following adjustments (if feasible):

* Set the level of significance to a reasonable level (e.g. 0.05)

* Change the test to directional if justified by prior data/knowledge

* Reduce the predicted mean difference to a reasonable level

* Try to focus on a population with a lower standard deviation

* Try to obtain more data

---

# Why is Experimental Design Important?

If our study is poorly designed, we run the risk of wasting resources such as time and money.

While introducing changes to increase our power can lead to stronger statistical inferences, this can also create additional costs.

* We need to strike an appropriate balance
  
--

Often, a good choice is to first conduct a pilot study - this can help us calculate power, and determine the sample size required in our main study.

---

# Good Experimental Design Characteristics

So what makes a well designed experiment?

* .seagreen_style[Biological knowledge] informs the study question
  
--

* Hypotheses are well defined
  
--

* The appropriate type of experiment is conducted - e.g. manipulative (aka experimental) and/or correlative (aka observational, mensurative)
  
--

* A .seagreen_style[pilot study] is used
  
--

* The strongest possible test of the specified hypotheses is produced
  
---

# Summary

The .orangered_style[statistical power] of a test can be influenced by 5 key factors.

1. The chosen .orangered_style[level of significance] `\(\alpha\)`
  
--

2. The .orangered_style[predicted mean difference]
  
--

3. The choice of test: .orangered_style[Directional] (one-sided) or .orangered_style[Non-Direction] (two-sided)
  
--

4. The .orangered_style[population standard deviation]
  
--

5. The .orangered_style[sample size]
  
--

We want to control these factors such that our test process is realistic, with a power `\(\geq 0.8\)`.

---

# End

That concludes our lecture on statistical power.

### What to do next:

* .seagreen_style[Quick Kahoot revision quiz]: Please go to [kahoot.it](kahoot.it) and type in the code shown

* If you have any questions, check the LMS, email us or ask in the computer labs

### Further Reading

* PDF booklet on statistical power on the LMS. **Please note Section 1.3 of this booklet is assessable.**
 
---

# References

* Cohen, J. (1988). *Statistical Power Analysis for the Behavioral Sciences*. 2nd edition. New York: Academic Press.

* Kokoska, S. (2020). Introductory statistics: a problem-solving approach (Third edition..). W H FREEMAN.

* The jamovi project. (2022). *Jamovi [Computer Software]*. [https://www.jamovi.org](https://www.jamovi.org).

---
class: middle

These notes have been prepared by Rupert Kuveke, Amanda Shaker, and other members of the Department of Mathematical and Physical Sciences. The copyright for the material in these notes resides with the authors named above, with the Department of Mathematical and Physical Sciences and with the Department of Environment and Genetics and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License 
<a href = "https://creativecommons.org/licenses/by-nc-nd/4.0/" target="_blank"> BY-NC-ND. </a>