BIO2POS Lecture Topic 2A

class: middle
background-image: url(data:image/png;base64,#LTU_logo_clear.jpg)
background-position: top left
background-size: 25%

# BIO2POS 
# Hypothesis Testing and the one sample `$t$`-test
## Data Analysis Topic 2A
### La Trobe University

---

# Welcome!

### In this lecture we will discuss conducting and interpreting the results of hypothesis tests, with a focus on one sample `$t$`-tests.

Over the following slides, we will cover:

* .orangered_style[Hypothesis Testing steps]
  
--

* .orangered_style[Type I and Type II Errors]

* .orangered_style[Effect Sizes]
  
--

* .orangered_style[One Sample *t*-tests]

If you have done previous statistics subjects (e.g. .seagreen_style[STM1001]), then much of this content may be familiar already.

---

# Intended Learning Objectives

### By the end of this lecture you will:

* be able to distinguish between and explain Type I and Type II errors

* understand the importance of and difference between statistical and clinical significance
  
--

* be able to correctly interpret and summarise results of a one sample `$t$`-test
  
--

<br>

The content you learn in Topics 2A and 2B will provide you with a solid foundation for conducting a variety of statistical tests, and we will extend these skills in future DA topics.

We will practice content from this topic in this week's DA computer lab, and the computer lab has some additional extension material if you would like to extend your knowledge.

---

# Hypothesis Testing Steps

Recall that we introduced hypothesis testing in the [DA Topic 1B lecture](https://rpubs.com/LTU_BIO2POS/DA1B).

Understanding hypothesis testing is an important skill in all areas of science.

We can summarise the hypothesis testing process into .orangered_style[7 main steps]:

1. Establish the null hypothesis `$H_0$` and the alternate hypothesis `$H_1$`
  
--

2. Determine the appropriate test to use
  
--

3. Choose the level of significance `$\alpha$`

4. Gather sample data
  
--

5. Analyse sample data
  
--

6. Reach a statistical conclusion (Reject `$H_0$`/Fail to reject `$H_0$`)
  
--

7. Write a clear conclusion

---

# `$t$`-Tests

In .seagreen_style[DA Topics 2A and 2B] we will introduce a set of statistical tests called .orangered_style[*t*-tests].

Several types of `$t$`-test exist, and together they can be applied to a wide variety of contexts.

To start, we will focus on the simplest `$t$`-test, the .orangered_style[one sample *t*-test].

* This type of test is appropriate when we are testing a .bold_style[sample mean] based on data for a .bold_style[single variable], against a .bold_style[fixed reference value] (the `$H_0$` mean `$\mu_0$`).

For example, recall the .seagreen_style[cat weight example] from DA Topic 1B:

* We could use a one sample `$t$`-test here as:
  
    * we have a single variable (*weight*) and 
    
    * we have a fixed reference value `$(H_0: \mu = 4.5)$`

As we progress through the content, we will also cover some theoretical details about `$t$`-tests.

---

# Type I and Type II Errors

Regardless of the type of test we conduct, using a sample to infer details about a population can have risks:

A .orangered_style[Type I Error (aka a False Positive)] occurs when the information from our sample data leads us to reject `$H_0$`, when in reality `$H_0$` is actually true.

A .orangered_style[Type II Error (aka a False Negative)] occurs when the information from our sample data leads us to not reject `$H_0$`, when in reality `$H_0$` is actually false.

We can think of the .orangered_style[level of significance] `$\alpha$` as being our accepted level of risk of incurring a Type I error.

* In other words, `$\alpha =$` the probability we make a Type I error
  
--

* This is why we always select `$\alpha$` to be small!

---

# Type I and Type II Errors

* These errors can have serious implications (imagine a test for a disease - which error would be worse to make?)

* The lower the selected `$\alpha$` value, the more likely it is that Type II errors will occur (there is always a price)
  
    * *For more about Type II errors, check the DA Online Learning Activity (Topic 1) on the LMS*
    
---

# Test Statistic Notation - one sample `$t$`-test

Recall that we introduced the concept of a .orangered_style[test statistic] in the [DA Topic 1B lecture](https://rpubs.com/LTU_BIO2POS/DA1B).

Test statistics condense sample data information into a single value which we can use to help decide if we should reject or not reject `$H_0$`.

For a one-sample `$t$`-test, we have:

`$$\text{Test Statistic (T)} = \dfrac{\overline{X} - \mu_0}{SE}$$`

Here:

* `$\overline{X}$` is the .orangered_style[sample mean]
  
--

* `$\mu_0$` is the assumed population mean under `$H_0$`
  
--

* `$SE$` is the .orangered_style[standard error] of `$\overline{X}$`, with `$SE = \dfrac{SD}{\sqrt{n}}$`
  
    * Recall `$SD$` is the sample standard deviation and `$n$` is the sample size

*You can read more about the standard error in the DA Online Learning Activity (Topic 1) on the LMS.*

---

# Test Statistic Interpretation

Our test statistic will follow a specific probability distribution under `$H_0$`.

* The specific distribution will depend on the test being conducted

Using our observed test statistic and this distribution, we can calculate a `$p$`-value.

* Recall that (loosely speaking) the `$p$`-value is the probability `$H_0$` is correct

If our test statistic is an extreme value, such that the resultant `$p$`-value is less than `$\alpha$`, we can conclude that we have a .orangered_style[statistically significant] result; otherwise our result is not statistically significant.

---

This example uses `$\alpha = 0.05$`. Test statistics equal to or larger than 1.96 in magnitude have only a 5% chance of being observed, if our null hypothesis is true.
  
--

* Here, if our observed test statistic lies in the blue interval (is less than 1.96 in magnitude), we cannot reject our null hypothesis

* If our observed test statistic lies in one of the orange regions (is greater than or equal to 1.96 in magnitude), we can reject our null hypothesis

---

# Statistical vs Clinical Significance

While we primarily focus on determining .orangered_style[statistical significance] when conducting a hypothesis test, it is important to also consider the .orangered_style[clinical significance] of our results.

We can compute and interpret an .orangered_style[effect size] for our test, to determine clinical significance.

There are numerous effect size measures - for this topic we focus on .orangered_style[Cohen's *d*].

* Cohen's `$d$` is easy to calculate for a one-sample `$t$`-test - it is the standardized difference between the sample mean and null hypothesis mean

`$$\text{Cohen's } d = \dfrac{\overline{X} - \mu_0}{SD}$$`
---

# Interpreting the Cohen's `$d$` effect size

The following conventions apply for interpreting Cohen's `$d$` (J. Cohen, 1992):

.shadedbox[ .center[
`$|d| < 0.2$`: "negligible effect size"

`$0.2 \leq |d| < 0.5$`: "small effect size"

`$0.5 \leq |d| < 0.8$`: "medium effect size"

`$|d| \geq 0.8$`: "large effect size"
]
]

Note that we are predominantly interested in the .orangered_style[magnitude] of the effect size when interpreting it.

* The sign is important though in that it tells us the direction of the effect

---

# Effect Size Example

Suppose that we have carried out a one sample `$t$`-test for our .seagreen_style[cat weight example] scenario, with the following results:

* Test Statistic: `$t= 2.83$`
  
  * `$p$`-value: `$p= 0.002$`
  
--

Should we reject `$H_0: \mu = 4.5$`?

* Of course!
  
--

But suppose the corresponding sample mean was `$\overline{X} = 4.55$`, and `$SD = 1.25$`.

* While our result is .orangered_style[statistically significant] (we conclude `$\mu \neq 4.5$`), it is .orangered_style[not clinically significant] (the effect size will be negligible with `$d = 0.04$`)!
  
--

* Here, the sample size `$n = 5000$`, which has led to the test statistic being large. 
  
  * The effect size ignores `$n$`.

---

# Example one sample `$t$`-test jamovi output

In BIO2POS, we will use jamovi for almost all our statistical analyses.

Presented below is an example jamovi output for a one sample `$t$`-test for the .seagreen_style[cat weight example], with a non-directional alternate hypothesis:

---

# Example one sample `$t$`-test jamovi output

Note that we can easily compute the effect size from the descriptives:

.center[
`$d = \dfrac{4.573 - 4.5}{0.266} \approx 0.274$`
]

* *There may be some minor differences due to rounding*
  
---

# Example Summary

A .orangered_style[one sample *t*-test] was conducted to determine if the average weight of domestic shorthair cats who regularly eat fish in their diet was different from 4.5 kgs.

The mean weight of cats in our sample data set was greater than 4.5 `$(M = 4.573 \text{ kgs}, SD = 0.266 \text{ kgs}, n=20)$`.

However, the one sample `$t$`-test showed that this difference was .orangered_style[not statistically significant] at the `$\alpha = 0.05$` level of significance, with 
  * `$t(19) = 1.229$`,

* `$p = 0.2342 > 0.05$`.

* The .orangered_style[effect size] was also small, with `$d = 0.275$`.

*Note that this is an example summary - while it may be a useful reference for you, it does not have to be strictly followed exactly. The most important things are:*

* *You can conduct the test properly*

* *You can clearly and correctly convey your understanding and interpretation of the results*
  
---

# Example one sample `$t$`-test jamovi output #2

Presented below are example jamovi outputs for another one sample `$t$`-test for the .seagreen_style[cat weight example], with a directional alternate hypothesis:

<br>

---

# Example Summary #2

A .orangered_style[one sample *t*-test] was conducted to determine if the average weight of domestic shorthair cats who regularly eat fish in their diet was greater than 4.5 kgs.

The mean weight of cats in our sample data set was greater than 4.5 `$(M = 4.619 \text{ kgs}, SD = 0.256 \text{ kgs}, n=20)$`.

The one sample `$t$`-test showed that this difference was .orangered_style[statistically significant] at the `$\alpha = 0.05$` level of significance, with `$t(19) = 2.067, p = 0.0263 < 0.05$`. The .orangered_style[effect size] was small to medium, with `$d = 0.462$`.

---

# Why is it called a `$t$`-test?

Recall that when we collect sample data and compute a sample mean, our aim is to gain information about the population mean.

But having just the sample mean does not tell us everything.

* We should also factor in details like the sample size, and the variability in our sample
  
--

Using this additional information, we can construct a .orangered_style[distribution of the sample mean].

* This will help us make inferences about the population mean, using just our sample data (no need to sample every individual in the population)

Under `$H_0$`, all `$t$`-tests assume that the distribution of the sample mean is a .orangered_style[Student's *t*-distribution] (or simply, a `$t$`-distribution).

* This is why we call such tests `$t$`-tests

---

# Why do we use the `$t$`-distribution?
 
The optimal type of sample mean distribution might be the .orangered_style[Normal Distribution], due to its appealing qualities, including:

* It is symmetric
  
--

* We only need two parameters to define the shape of the distribution - the mean `$\mu$` and the variance `$\sigma^2$`

* We can easily standardize results for easier inference and comparison between data sets
  
--

* Based on the .orangered_style[Central Limit Theorem (CLT)], we can assume that the distribution of the sample mean will follow a normal distribution, so long as our sample size is large `$(n \geq 30)$`
 
--

### So why not use the normal distribution?

Unfortunately, to specify the shape of the normal distribution, we need to know the population mean and population variance - generally, we will not be able to obtain these...

---

# Student's t-distribution

Fortunately, we can use the `$t$`-distribution as a replacement for the normal distribution.

The shape of the `$t$`-distribution is defined by the .orangered_style[degrees of freedom (df)] parameter, which is related to the .orangered_style[sample size].

* So unlike the normal distribution, all we need to define the `$t$`-distribution is the sample size `$n$`, which we should always know

* For `$t$`-tests, we have `$T \sim t_{df}$`, with `$df = n-1$`

As `$df \rightarrow \infty$`, the `$t$`-distribution converges to the normal distribution

* This means we can use the `$t$`-distribution as a replacement to the normal distribution, as long as `$n$` is large

---

---

# Student's t-distribution

.pull-left[

* For `$t$`-tests, we use the `$t$`-distribution as the distribution for our sample mean
{{content}}
]

* We treat our test statistic as an observed value from this distribution
{{content}}

* For small sample sizes, the `$t$`-distribution does not match the normal distribution, making it harder to obtain statistically significant results (check thickness of tails)

.pull-right[
<img src="data:image/png;base64,#BIO2POS_DA_Lecture_Topic_2A_files/figure-html/unnamed-chunk-7-1.svg" style="display: block; margin: auto;" />

]

---

# Summary

There are 7 key steps in hypothesis testing (don't worry if you can't remember them all just yet, you can always check back to refresh your memory).

It is important to compute effect sizes to check if results are clinically significant.

We specify `$\alpha$` to be small to minimise our chances of making a Type I error (false positive).

When reporting the results of a test, we should note the level of significance, the test statistic and the `$p$`-value at a bare minimum, to support our conclusion.

We use a `$t$`-distribution with `$df=n-1$` for our `$t$`-tests `$(T \sim t_{df})$`

---

# End

That concludes our lecture on hypothesis testing and one sample `$t$`-tests. We will look at different types of `$t$`-tests in Topic 2B.

### What to do next:

* .seagreen_style[Quick Kahoot revision quiz]: Please go to [kahoot.it](kahoot.it) and type in the code shown

* Make sure to attend the next DA Lecture on Topic 2B

* If you have any questions, check the LMS, email us or ask in the computer labs

---

# References

* Cohen, J. 1992. “A Power Primer.” *Psychological Bulletin* 112 (1): 155.

* Cohen, J. 1988. *Statistical Power Analysis for the Behavioral Sciences*. 2nd edition. New York: Academic Press.

* The jamovi project. (2022). *Jamovi [Computer Software]*.[https://www.jamovi.org](https://www.jamovi.org).

---
class: middle

<font color = "grey">
These notes have been prepared by Rupert Kuveke, Amanda Shaker, and other members of the Department of Mathematical and Physical Sciences. The copyright for the material in these notes resides with the authors named above, with the Department of Mathematical and Physical Sciences and with the Department of Environment and Genetics and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License 
<a href = "https://creativecommons.org/licenses/by-nc-nd/4.0/" target="_blank"> BY-NC-ND. </a>
</font>