STM1001 Topic 7 Workshop

class: middle
background-image: url(data:image/png;base64,#LTU_logo.jpg)
background-position: top left
background-size: 30%

# STM1001 [Topic 7](https://bookdown.org/a_shaker/STM1001_Topic_7/) Workshop
## One-way ANOVA
### La Trobe University
This workshop complements the [Topic 7 readings](https://bookdown.org/a_shaker/STM1001_Topic_7/)

---

# Topic 7: One-way ANOVA

## In this week's readings:

* We will not have time to cover every concept, so please make sure you read this topic's readings thoroughly.

---

name: stat
class: middle
background-image: url(data:image/png;base64,#slide_1.png)
background-size: 110%

---

name: stat
class: middle
background-image: url(data:image/png;base64,#slide_10.png)
background-size: 100%

---

# One-way ANOVA

* Recall that we used the independent samples `$t$`-test to test for a difference between ***two independent groups***

* What if we want to test for a difference between ***two or more independent groups***?

* Then, we can use one-way ANOVA

* ANOVA is short for "Analysis of Variance"

---
# Claim

* Suppose we made a claim that we believed that there is a relationship between phone screen time and chocolate preference

* More specifically, we are wondering if there is a statistically significant difference in average phone screen time between those who prefer dark vs milk vs white chocolate

* We will test this claim using one-way ANOVA

* We will use your answers from the following Menti questions to test the claim

---

name: menti
class: middle
background-image: url(data:image/png;base64,#menti.jpg)
background-size: 115%

# Menti

## Go to [www.menti.com](https://www.menti.com) and use

## the code provided

---

# One-way ANOVA

* How do we test the claim?

* We will use ***One-way ANOVA***

* First, we need to set up our hypotheses:

`$$H_0: \mu_1 = \mu_2 = \mu_3 \text{ versus } H_1: \text{not all } \mu_i\text{'s are equal,}$$`

where:

* `$\mu_1$` denotes the population mean daily screen time of those who prefer white chocolate
* `$\mu_2$` denotes the population mean daily screen time of those who prefer milk chocolate
* `$\mu_3$` denotes the population mean daily screen time of those who prefer dark chocolate

Note that rather than using the `$t$`-distribution, we will be using the `$F$`-distribution for ANOVA

See [this week’s readings](https://bookdown.org/a_shaker/STM1001_Topic_7/3-carrying-out-the-test.html) for more information

---
# One-way ANOVA

* What type of variables are required for a One-way ANOVA?

.content-box-blue[
.center[
A one-way ANOVA will always involve two variables:
]
1. The ***dependent*** variable, sometimes also called the *response* variable. This should be a numeric, continuous variable.
2. The ***independent*** variable. This should be a categorical variable with ***two or more categories***.
]

* So our ***dependent*** variable is minutes of screen time

* Our ***independent*** variable is chocolate preference

---
# One-way ANOVA

* Let's use our responses to carry out the test

`$$H_0: \mu_1 = \mu_2 = \mu_3 \text{ versus } H_1: \text{not all } \mu_i\text{'s are equal,}$$`

where:

* [One-way ANOVA calculator](https://www.socscistatistics.com/tests/anova/default2.aspx
)

---

#Group activity 1

* In your group, discuss the result and answer the following:

* What is the `$p$`-value?
  * What is the test statistic (*f*-ratio)?
  * Do we have evidence that there is a significant difference in average daily screen time between chocolate preference group?

After you have had a chance to discuss, nominate one person who can speak for the group and explain your conclusion to the rest of the class

---

#Defining the `$F$`-distribution

* Recall that to define the `$t$`-distribution, we needed to know the degrees of freedom

* To define the `$F$`-distribution, we need to know two degrees of freedom:

* `$d_1 = k - 1$`, where `$k$` is the number of groups. This is the "between group" degrees of freedom.
    * `$d_2 = N - k$`, where `$N$` is the total sample size. This is the "within group" degrees of freedom.

---

#Defining the `$F$`-distribution

* Depending on the degrees of freedom, `$d_1$` and `$d_2$`, the shape of the distribution will look different:
<img src="data:image/png;base64,#Topic_7_Workshop_files/figure-html/unnamed-chunk-2-1.svg" style="display: block; margin: auto;" />

---

#The test statistic

* For a one-way ANOVA, the test statistic, or the `$F$` value, is calculated by estimating the ratio of ***between group variation*** to ***within group variation***: `$\displaystyle \frac{\text{between group variation}}{\text{within group variation}}$`

* The ***between group variation*** is a measure of how much the sample means for each group vary.

* The ***within group variation*** is a measure of how much individual sample values within a group vary from their group sample mean.

* If the ***between group variation*** is much larger than the ***within group variation***, then the `$F$`-statistic will be very large and lead to a statistically significant result.

* On the other hand, if the ***between group variation*** is not large compared to the ***within group variation***, then the `$F$`-statistic will not be large and subsequently will not lead to a statistically significant result.

---

#Group activity 2

* For our example, answer the following questions:

* What is `$d_1$`?
  * What is `$d_2$`?

---

#Post-hoc tests

* The one-way ANOVA test tells us whether there is a difference between groups, but not how many groups are different from each other, nor how many

* But we can take another step in the analysis to answer these questions by using post-hoc tests

* This is normally only done if the initial ANOVA result was significant

* However, for our purposes today, we will look at the post-hoc test results regardless of whether the ANOVA result was significant or not

---

#Group activity 3

* Returning to our example, the online calculator should also have given us the results of post-hoc tests. In your group, discuss the results and answer the following:

* Which groups were significantly different from each other? What was the `$p$`-value(s)?
    
    * Which groups were not significantly different from each other? What was the `$p$`-value(s)?

---

#More details in the readings

* For more, see [this topic’s readings](https://bookdown.org/a_shaker/STM1001_Topic_7/)

---

background-image: url(data:image/png;base64,#computerlab.jpg)
background-position: bottom
background-size: 75%
class: center

# See you in the computer labs!

Continue with this topic's readings: [Topic 7 Readings](https://bookdown.org/a_shaker/STM1001_Topic_7/)

---
class: middle

<font color = "grey">
These notes have been prepared by Amanda Shaker. The copyright for the material in these notes resides with the authors named above, with the Department of Mathematics and Statistics and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License 
<a href = "https://creativecommons.org/licenses/by-nc-nd/4.0/" target="_blank"> BY-NC-ND. </a>
</font>