STM1001 Topic 5 Lecture

class: middle
background-image: url(data:image/png;base64,#LTU_logo.jpg)
background-position: top left
background-size: 30%

# STM1001 [Topic 5](https://amandashaker-stm1001-topic-5.share.connect.posit.cloud/) Lecture
## Hypothesis Testing
### La Trobe University
This lecture complements the [Topic 5 readings](https://amandashaker-stm1001-topic-5.share.connect.posit.cloud/)

---

# Topic 5: Related Links

## Readings

[Topic 5 readings](https://amandashaker-stm1001-topic-5.share.connect.posit.cloud/)

## Notation

[Topics 5 and 6: Hypothesis testing and `$t$`-tests](https://amandashaker-stm1001-topic-0.share.connect.posit.cloud/notation-summary.html#topics-5-and-6-hypothesis-testing-and-t-tests)

---

# Topic 5: Hypothesis Testing

**Overview**

---

name: stat
class: middle
background-image: url(data:image/png;base64,#slide_1.png)
background-size: 110%

---

name: stat
class: middle
background-image: url(data:image/png;base64,#slide_8.png)
background-size: 100%

---

# Hypothesis Testing

In this lecture we will introduce ***Hypothesis Testing***, which is a famous statistical process and a cornerstone of modern scientific research.

We will cover:

* Hypothesis testing steps

* Conducting a `$t$`-test

* The `$t$`-distribution

* Confidence Intervals

* `$t$`-test assumptions and checks

* Type I and Type II errors

---

# Hypothesis Testing

In the context of Statistics, a ***hypothesis*** is a specific statement that we wish to test

It is a statement that may or may not be true, and the idea is to carry out a study to test the truth of a hypothesis

In statistical hypothesis testing, we consider two options:

* A ***Null Hypothesis***, denoted `$(H_0)$` and 
--

* An ***Alternative Hypothesis***, denoted `$(H_1)$`

We always begin our testing with the assumption that our ***null hypothesis*** `$(H_0)$` is true.

Our intention is to use our sample data to test whether or not we have evidence ***against*** the null hypothesis `$(H_0)$`

If there is evidence ***against*** the null hypothesis `$(H_0)$`, then we conclude that the evidence ***supports*** the alternative hypothesis `$(H_1)$`.

---

# The Hypothesis Testing Process

We can summarise the hypothesis testing process into 7 main steps:

1. Establish the Null Hypothesis `$H_0$` and the Alternate Hypothesis `$H_1$`
  
--

2. Choose the level of significance `$\alpha$`
  
--

3. Determine the appropriate test to use

4. Gather sample data
  
--

5. Analyse sample data
  
--

6. Reach a statistical conclusion (Reject `$H_0$` or Fail to reject `$H_0$`)
  
--

7. Write a clear conclusion

Let's now consider hypothesis testing in the context of an example.

---

# Sleep Data Example

Suppose we made a claim that we believed that on average, STM1001 students spend a total of 480 minutes (8 hours) sleeping per day.

* We can test this claim via a **hypothesis test**

Suppose we asked students the question *'In the past 24 hours, how many minutes did you spend sleeping?'* and obtained `$n=24$` responses, summarised in the histogram below:

.pull-left[
<img src="data:image/png;base64,#Topic_5_Lecture_files/figure-html/unnamed-chunk-2-1.svg" width="85%" style="display: block; margin: auto;" />
]

.pull-right[
* Sample mean: `$\overline{x} = 406.7$`

* Sample standard deviation: `$s = 92.3$`

* Does our data support the idea that `$\mu = 480$`?

* We can use our sample of data to test our claim and reach a conclusion
]

---

# Hypothesis Testing Example

For our ***sleep data example***, we would define the null and alternative hypotheses as follows:

`$$H_0:\mu = 480\;\;\text{versus}\;\;H_1:\mu \neq 480,$$`
where:

* `$\mu$` denotes the population mean number of minutes people in the population (STM1001 students) spend sleeping per day

* `$H_0$` denotes the null hypothesis that the mean number of minutes people in this population spend sleeping is ***equal*** to 480

* `$H_1$` denotes the alternative hypothesis that the mean number of minutes people in this population spend sleeping is ***not equal*** to 480

To test this hypothesis, we can use a type of hypothesis test called a ***One-sample `$t$`-test***.

Recall that when we carry out a hypothesis test, to start out with, we ***assume the null hypothesis to be true***. Then, if our sample provides evidence that this assumption was not reasonable, we ***reject the null hypothesis*** and therefore have evidence in favour of the ***alternative hypothesis***.

---

# Hypothesis testing

In more general terms, suppose we have a sample of `$n$` observations that have been independently sampled from a population with population mean `$\mu$`.

Consider the hypotheses

`$$H_0:\mu = \mu_0\;\;\text{versus}\;\;H_1:\mu \neq \mu_0,$$`

where:

* `$\mu_0$` denotes the population mean under the null hypothesis.

Then a commonly used statistical test for this type of hypothesis test is the ***one-sample `$t$`-test*** based on the data observed in the sample, provided either the sample size `$n$` was large or the underlying distribution of the population from which the sample was taken is normally distributed.

*Note: we will learn how to check these assumptions shortly, and also the reason the test is called a `$t$`-test: we assume the standardised sample mean (test statistic) follows a `$t$`-distribution under `$H_0$`; i.e., because we use the `$t$`-distribution*

---

# SD & Variance - Population vs Sample

Recall that for a random variable `$X$`, we have:

|           |  Standard Deviation| Variance |
|:----------|:---------:|:-------------------:|
|**Population**  | `$\sigma$` | `$\sigma^2$`        |
|     vs         |          |                   |
|**Sample**      | `$s$`      | `$s^2$`             |

* Note we distinguish between population and sample by using Greek and English letters respectively

For the sample mean `$\overline{X}$`, which itself is a random variable, we have:

|               | Standard Deviation  | Variance |
|:--------------|:-------------------:|:--------:|
|**Population** | `$\displaystyle\frac{\sigma}{\sqrt{n}}$`  | `$\displaystyle\frac{\sigma^2}{n}$` |   
|     vs          |                                         |                                   |
|**Sample**     | `$\displaystyle\frac{s}{\sqrt{n}}$`       | `$\displaystyle\frac{s^2}{n}$`      |

---

# Sample Data -> Inference

In applied statistical ***inference***, generally we are using sample data to ***infer*** something about a population of interest.

Our motivation for conducting a hypothesis test is to determine:

* Given our sample data, is `$H_0$` likely to be accurate/true?
  
--

To answer this question, we will need to conduct a formal statistical test (with several steps), using descriptive statistics from our data, e.g.:

* the observed sample mean `$\overline{x}$`, 
  
  * the observed sample standard deviation `$s$`,
  
  * the sample size `$n$`

We will use these values, along with our specified `$\mu_0$` value for our parameter of interest under `$H_0$`, to calculate a ***Test Statistic***.

---

# Test Statistics

**Test statistics** are random variables which we utilize during the Hypothesis Testing Process when trying to decide if we can reject `$H_0$`.

Different statistical tests will have different test statistic equations and details, but the way we use test statistics are similar regardless of the type of hypothesis test we are using.

What we will focus on here is an overview on:

* ***Why*** we use test statistics
  
  * ***How*** they tie in with our hypothesis testing, and 
  
  * ***How to use them for statistical inference***, within the context of the Sleep Data example and the One-sample `$t$`-test

---

# The test statistic (random version)

The ***test statistic*** can be thought of as a ***standardised*** version of the sample mean.

In general terms, for the type of test we'll consider today, the test statistic is defined as follows:

$$T = \displaystyle \frac{\overline{X} - \mu_0}{\text{SE}} = \frac{\overline{X} - \mu_0}{S/\sqrt{n}}, $$

where:

* `$T$` denotes the ***test statistic***, which is ***random***. That is, it is a ***random variable*** and will follow a specific statistical distribution, the `$t$`-distribution, with `$T \sim t_{df}$` (more on this shortly).

* `$\overline{X}$` denotes the sample mean

* `$\text{SE}$` refers to the ***Standard Error***. The standard error is an *estimator* of the *standard deviation of the sample mean*, and is equal to `$\frac{S}{\sqrt{n}}.$`

---

# The observed test statistic

We can input our specific data from our study into the formula for the ***observed test statistic*** (this is a lot like calculating a `$z$`-score):

$$t = \displaystyle \frac{\bar{x} - \mu_0}{\text{se}} = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}, $$

where:

* `$t$` denotes the observed ***test statistic*** - this is a number, which we consider to be sampled from the test statistic distribution

* `$\bar{x}$` denotes the observed sample mean

* `$\text{se}$` refers to the ***observed standard error***. The standard error is an *estimate* of the *standard deviation of the sample mean*, and is equal to `$\frac{s}{\sqrt{n}}.$`

Note the difference in notation between the ***random*** and ***observed*** test statistic definitions where, for example, `$T$` is the ***random*** test statistic, and `$t$` is the ***observed*** test statistic.

---

# The observed test statistic

Although we will be using statistical software packages to calculate the test statistic for us, we will calculate the test statistic for the Sleep Data example now, to help us interpret what it means.

.pull-left[
Recall that we have:

* `$\bar{x} = 406.7$`
* `$\mu_0 = 480$`
* `$s = 92.3$`
* `$n = 24$`
* `$\text{se} = \frac{s}{\sqrt{n}} = \frac{92.3}{\sqrt{24}} = 18.84066$`
]

.pull-right[
Therefore,

$$
`\begin{align}
t &= \frac{\bar{x} - \mu_0}{\text{se}} \\
  &= \frac{406.7 - 480}{18.84066} \\
  &= \frac{-73.3}{18.84066} \\
  &= -3.89.
\end{align}`
$$
]

---

# Test Statistic Interpretation

***Observed test statistics*** condense information from our sample data into a single value which we can use to help decide if we should reject or not reject `$H_0$`.

***Observed test statistics*** are similar to the `$z$`-scores we learnt about in Topic 3, in that an ***observed test statistic*** can be thought of as a ***standardised*** version of the observed sample mean. [Would a `$z$`-score of -3.89 be thought of as very small, very large, or about average?]

The ***test statistic*** will follow a specific probability distribution under `$H_0$`.

* The specific distribution will depend on the test being conducted

Using our observed test statistic and the distribution of the test statistic, we can calculate a special probability known as the ***p-value*** to help us decide how likely it is that the null hypothesis is true.

We'll look at this in more detail shortly, but this is our end goal - to obtain a `$p$`-value so we can reach a statistically-informed decision regarding our null hypothesis.

---

# Distribution of the Test Statistic

For this example, the optimal distribution to use for our test statistic would be the ***Normal Distribution***, due to its appealing qualities, including:

* It is symmetric
  
--

* We only need two parameters to define the shape of the distribution - the mean `$\mu$` and the variance `$\sigma^2$`

* We can easily standardise results for easier inference and comparison
  
--

* Based on the ***Central Limit Theorem (CLT)***, we can assume that ***the distribution of the sample mean*** will follow a normal distribution, so long as our sample size is large `$(n \geq 30)$`
 
--

Recall, in order to specify the shape of the normal distribution  we need to know the population mean and population variance. We can use our `$\mu_0$` value for the mean, but unfortunately, `$\sigma$` is generally unknown and we will not be able to obtain it. However...

* We can estimate the population standard deviation `$\sigma$` using our sample standard deviation `$s$`, but we need to account for the extra uncertainty this introduces

---

# The `$t$`-distribution

Fortunately, we can use **the `$t$`-distribution** as a replacement for the normal distribution!

The shape of the `$t$`-distribution is defined by a parameter called the ***degrees of freedom*** **(`$\text{df}$`)**, which is related to the ***sample size `$n$`***.

* So, unlike the normal distribution, all we need to define the `$t$`-distribution is the sample size `$n$`, which we should always know

* For the One-sample `$t$`-test, we have `$T \sim t_{\text{df}}$`, with `$\text{df} = n-1$`

As `$\text{df} \rightarrow \infty$`, the `$t$`-distribution converges to the normal distribution.

* This means we can use the `$t$`-distribution as a replacement to the normal distribution, as long as `$n$` is large

---

Notice how the coloured (`$t$`) distributions approach the black (Normal) distribution as `$\text{df}$` becomes larger.

---

# The `$t$`-distribution

.pull-left[
<img src="data:image/png;base64,#Topic_5_Lecture_files/figure-html/unnamed-chunk-4-1.svg" style="display: block; margin: auto;" />

]

.pull-right[

* For `$t$`-tests, we use the `$t$`-distribution as the distribution for our sample mean
{{content}}
]

* We treat our test statistic as an observed value from this distribution
{{content}}

* For small sample sizes, the `$t$`-distribution does not match the normal distribution and is more conservative, making it harder to obtain statistically significant results (check thickness of tails - and see the [readings](https://amandashaker-stm1001-topic-5.share.connect.posit.cloud/1-tdist.html) for further discussion which will be useful context for Quiz 5.)

---

# t-distribution for Sleep Data Example

Recall that the `$t$`-distribution is ***standardised***. For example, here is a distribution of `$\displaystyle (\overline{X} - \mu) / (S / \sqrt{n})$`. That is, it is a `$t$`-distribution with `$\text{df} = 23$`, since `$n = 24$`.

That means our value of `$\mu_0 = 480$` corresponds to 0 in this distribution.

---

# Sleep Data Example test statistic

If our sample mean is a lot higher than `$\mu_0 = 480$`, then our observed test statistic will be above zero – e.g. maybe 1, 1.4, 3, or even higher.

If our sample mean is a lot lower than `$\mu_0 = 480$`, then our observed test statistic will be below zero – e.g. maybe -2, -3, or even lower.

If our sample mean is far from `$\mu_0$`, this suggests that `$H_0$` is not correct.

Similarly, ***if our observed test statistic is far from 0, this suggests that `$H_0$` is not correct, and should be rejected.***

---

# Observed test statistic inference

Recall that our sample mean was 406.7, which was much lower than 480. Also recall that this translated to an observed test statistic of `$t = -3.89$`.

* Would it be reasonable to hold onto the assumption that the true mean is 480 minutes? I.e., does the null hypothesis seem correct?

* Probably not. In this case, we would say we **reject the null hypothesis** and that we have evidence that the true mean is different from 480 minutes.

---

# Observed test statistic inference

In general, we want to ask the question:

*If the null hypothesis is true, what is the probability we would have obtained a result at least as extreme as our observed test statistic?*

This probability is the ***p-value*** value mentioned earlier.

* Using probability notation, the ***p-value*** equals `$2\times P(T \geq |t|)$` for our one-sample `$t$`-test scenario (for a two-sided test - more on this later).

* If the `$p$`-value is *less than* our ***level of significance*** `$\alpha$` (Greek letter, 'alpha'), this is sufficient evidence to reject `$H_0$`

* If the `$p$`-value is *greater than* our level of significance `$\alpha$`, then we do not have enough evidence to reject `$H_0$`

* Note that we don't say `$H_0$` is true, but rather that ***we do not reject*** `$H_0$`. Just because we don't reject `$H_0$` does not mean we have proven it is true. So it is best to avoid using terms like, *accept `$H_0$`*.

---

# Observed test statistic inference

* Typically, a ***significance level*** of `$\alpha = 0.05$` is used, although different values of `$\alpha$` can be chosen

* This means that in general, if we have `$p < 0.05$`, we ***reject `$H_0$`***. In this case, we would say that the result is ***statistically significant***.

* Normally, a test statistic around 2 or higher (or -2 or lower) will translate into a `$p$`-value less than 0.05, i.e. a statistically significant result, but this will depend on the degrees of freedom

* We will now consider the hypothesis test results for the Sleep Data example.

---

# Sleep Data Example: `$t$`-test output

``` r
	One Sample t-test

data:  sleep$sleep1
 t = -3.8904, df = 23, p-value = 0.0007382
alternative hypothesis: true mean is not equal to 480
95 percent confidence interval:
 367.6729 445.6604
sample estimates:
 mean of x 
 406.6667 
```

---
# Sleep Data Example: `$t$`-test output

``` r
	One Sample t-test

data:  sleep$sleep1
 `t = -3.8904`, df = 23, p-value = 0.0007382
alternative hypothesis: true mean is not equal to 480
95 percent confidence interval:
 367.6729 445.6604
sample estimates:
 mean of x 
 406.6667 
```

* The **observed test statistic** is `$t = -3.8904$`

---

# Sleep Data Example: `$t$`-test output

``` r
	One Sample t-test

data:  sleep$sleep1
 `t = -3.8904`, `df = 23`, p-value = 0.0007382
alternative hypothesis: true mean is not equal to 480
95 percent confidence interval:
 367.6729 445.6604
sample estimates:
 mean of x 
 406.6667 
```

* The **observed test statistic** is `$t = -3.8904$`
* The **degrees of freedom** is `$n - 1 = 24 - 1 = 23$`

---

# Sleep Data Example: `$t$`-test output

``` r
	One Sample t-test

data:  sleep$sleep1
 `t = -3.8904`, `df = 23`, `p-value = 0.0007382`
alternative hypothesis: true mean is not equal to 480
95 percent confidence interval:
 367.6729 445.6604
sample estimates:
 mean of x 
 406.6667 
```

---

# Sleep Data Example: `$t$`-test output

``` r
	One Sample t-test

---

# Sleep Data Example: `$t$`-test output

``` r
	One Sample t-test

data:  sleep$sleep1
 `t = -3.8904`, `df = 23`, `p-value = 0.0007382`
alternative hypothesis: true mean is not equal to 480
 `95 percent confidence interval:`
 `367.6729 445.6604`
sample estimates:
 `mean of x `
 `406.6667 `
```

* The **observed test statistic** is `$t = -3.8904$`
* The **degrees of freedom** is `$n - 1 = 24 - 1 = 23$`
* The ** `$p$`-value** is `$p = 0.0007$`. Since this `$p$`-value is less than 0.05, we can reject `$H_0$` and conclude that the average number of minutes people spend sleeping in this population is ***different*** from `$\mu_0 = 480$` 
* The **sample mean** is `$\bar{x} = 406.67$`
* The **95% confidence interval** is (367.67, 445.66). This means we are 95% confident that the population mean sleep for this population lies within the interval (367.67, 445.66).

We will now consider these concepts more thoroughly.

---

# Sleep Data Example: observed test statistic

Recall that the observed test statistic was `$t = -3.89$`

The test statistic allows us to evaluate exactly how ***extreme*** this result is after taking into consideration the variability and sample size.

As the test statistic is not close to zero (or its absolute value is relatively large), we expect that the corresponding `$p$`-value will be small.

---
# Sleep Data Example: The `$p$`-value

For our sleep data example, the `$p$`-value we obtain allows us to answer the following question:

If it were true that `$\mu = \mu_0 = 480$`, what are the chances that, when we took our sample of `$n = 24$` students, we would have seen either this sample mean of `$\bar{x} = 406.67$` (which translates to an observed test statistic of `$t = -3.89$`), ***or a more extreme result***?

* Is our observed test statistic ***extreme*** in the context of the above `$t$`-distribution (which assumes `$H_0$` is true)? Let's have a look:

---

# Sleep Data Example: The `$p$`-value

---

# Sleep Data Example: The `$p$`-value

As it turns out, our observed test statistic ***is*** fairly ***extreme*** in the context of this distribution, because the probability of observing this test statistic is only `$p = 0.0007$` ***if `$H_0$` is true***.

In other words:

$$
`\begin{align}
p = & 2\times P(T \geq |t|) \\
= & P(T \leq -3.8904) + P(T \geq 3.8904) \\
= & 0.0007.
\end{align}`
$$

Recall this probability is our ** `$p$`-value**.

Note that here we were only interested in whether `$t$` is ***different from*** `$0$` (or equivalently whether `$\mu$` was ***different from*** 480).

* I.e. what is the probability of seeing a test statistic at least as extreme ***in either direction***?

* That is, greater than 3.8904 or less than -3.8904. This is called a ***two-sided test***. This point will be further explained shortly.

---

# The `$p$`-value & the significance level `$\alpha$`

* Since our `$p$`-value was small, we had enough evidence to ***reject `$H_0$`***.

* Therefore, we have evidence to support the alternative hypothesis that `$\mu \neq 480$`, i.e. that our result is ***statistically significant***.

* Recall that in general, the standard level of significance is `$\alpha = 0.05$`, although other levels of `$\alpha$` can be chosen.

* Our overall hypothesis test conclusion is based on comparing our `$p$`-value against our ***level of significance, `$\alpha$`***. This is called the ***p-value approach***.

That is:

* if `$p < \alpha$`, we ***reject*** `$H_0$`
    
  * if `$p > \alpha$`, we ***do not reject*** `$H_0$`

---

# The Critical Region approach

There is another method we could also use to reach an overall hypothesis test conclusion, called the ***Critical Region approach***.

Consider the question:

* *If `$\alpha = 0.05$`, how extreme would our test statistic need to be in order to reject `$H_0$`?*

To answer this question, we can find the ***quantiles*** such that `$2 \times P(T \geq |t|) = P(T \leq t) + P(T \geq t) = 0.05$` as represented on the next slide.

---

* As we can see, `$P(T \leq -2.07) + P(T \geq 2.07) = 0.025 + 0.025 = 0.05$` (the probability represented in each "tail" is 0.025).

* This means that if our test statistic was either greater than 2.07 or less than -2.07,  we would say it falls in the ***critical region*** and we would reject `$H_0$`, because any value of `$t$` within this range would result in `$p < 0.05$`.

* Since our test statistic is `$t = -3.8904 < -2.07$`, we ***reject `$H_0$`***.

---

name: menti
class: middle
background-image: url(data:image/png;base64,#menti.jpg)
background-size: 115%

# Kahoot

## Go to [www.kahoot.it](https://www.kahoot.it) and use

## the code provided

---

# One-sided vs two-sided tests

Earlier, we mentioned that the test we had carried out was a ***two-sided test***. ***One-sided tests*** are also possible. Consider the following cases:

1. **Two-sided test**: *Is the average sleep per day for students from this population **different** from 480 minutes?*

+ `$H_0:\mu = 480\;\;\text{versus}\;\;H_1:\mu \neq 480$`
   
--

2. **One-sided test**: *Is the average sleep per day for students from this population **greater than** 480 minutes?*

+ `$H_0:\mu = 480\;\;\text{versus}\;\;H_1:\mu > 480$`

3. **One-sided test**: *Is the average sleep per day for students from this population **less than** 480 minutes?*

+ `$H_0:\mu = 480\;\;\text{versus}\;\;H_1:\mu < 480$`
 
--
   
Examples 2 and 3 above are referred to as 'one-sided tests' because they are only testing for extreme values in one direction. Consider the figure on the next slide, which shows the critical values (CV) required for each test.

---
# One-sided vs two-sided tests

---
# One-sided vs two-sided tests

For a **two-sided test**, we have `$p\text{-value} = 2 \times P(T\geq |t|) \text{ for } T\sim t_{\text{df}}$`.

* Since the combined area of the shaded area must equal `$\alpha$`, we have an area of `$\alpha / 2$` at each tail.

--
1. **The two-sided test** is often referred to as a **two-tailed test**, as we are interested in extreme values in the two 'tails' of the distribution curve.

2. For **one-sided (right-tailed) tests** we are only interested in extreme values in the right tail (i.e. greater than `$\mu_0$`) so we have an area of `$\alpha$` in the right tail. For a **one-sided test (right-tailed)**, we have:
   `$$p\text{-value} = P(T\geq t) \text{ for } T\sim t_{\text{df}}$$`
   
--

3. Similarly, for **one-sided test (left-tailed)**, we have:
   `$$p\text{-value} = P(T\leq t) \text{ for } T\sim t_{\text{df}}$$`

Two-sided tests are often preferred in practice because they are unbiased in terms of the predicted direction of the results. However, in this subject we will practice using both two-sided and one-sided tests.

---
# Type I and Type II Errors

.content-box-blue[
.center[
**There are two types of error that can occur:**
]
1. **Type I error (False Positive):** Reject `$H_0$` when `$H_0$` is true.
2. **Type II error (False Negative):** Fail to reject `$H_0$` when `$H_0$` is false.
]

For example:

* **Type I error:** We conclude that the mean number of minutes STM1001 students spend sleeping is different from 480 (reject `$H_0$`) when it is actually equal to 480 (i.e. `$H_0$` is true)

* **Type II error:** We do not conclude that the mean number of minutes STM1001 students spend sleeping is different from 480 (do not reject `$H_0)$`, when it is actually different from 480 (i.e. `$H_0$` is false)

---
# Type I and Type II Errors

.content-box-blue[
.center[
**Probability of Type I error:**
]
The probability of making a Type I error is equal to the significance level, `$\alpha$`.
]

We can think of the ***level of significance***, denoted by the Greek letter `$\alpha$`, as being the threshold for our maximum accepted level of risk of incurring a Type I error.

* In other words, `$\alpha$` is the threshold for the maximum acceptable probability of making a Type I error when conducting a test
  
--

* We don't want to make an incorrect inference, so `$\alpha$` should be small!

---

# Type I and Type II Errors

* These errors can have serious implications
  
    - Imagine a test for a disease: which error would be worse to make?

* The lower your specified `$\alpha$` value, the more likely it is that Type II errors will occur (there is always a cost)

---

# Confidence Intervals

As well as reporting the observed sample mean, reporting a ***confidence intervals*** (CI) is very useful. We can use a CI to:

* Provide a range within which the true population mean is likely to be

* Indicate how ***confident*** we are in our estimate

Consider the following statements:

1. *We are 95% confident that the true mean minutes STM1001 students spend sleeping each day is **between 400.67 and 412.67 minutes.***

2. *We are 95% confident that the true mean minutes STM1001 students spend sleeping each day is **between 6.67 and 806.67 minutes.***

Which statement do you prefer, and why?

---

# Confidence Intervals

In our sleep data example, our estimated mean time sleeping is 406.67 minutes.

The first CI was more useful than the second CI, because it was much narrower.

* Generally, narrow confidence intervals mean we are confident in our estimate

* On the other hand, some confidence intervals can be so wide that they are barely informative at all!

The width of a confidence interval is determined by:

--
  * The sample size `$n$`
    
--

* The estimated variability in the sample `$s$`
    
--

* The level of significance `$\alpha$`

---

# Confidence Intervals

To calculate a confidence interval, we take `$\bar{x}$` and then add and subtract some **margin of error**. Consider the following definition:

.content-box-blue[
.center[
**95% Confidence interval calculation:**
]
`$$95\% \, \text{ CI } = \bar{x} \pm t_{\text{df,}0.975}\times\text{se},$$`
]

where:

* `$t_{\text{df,}0.975}$` is the value from the `$t_{\text{df}}$` distribution such that `$P(T \leq t_{\text{df,}0.975}) = 0.975$`, i.e. the 0.975th quantile

* `$\text{se}$`, the standard error, is equal to `$\frac{s}{\sqrt{n}}$`

---
# Confidence intervals

* Our level of confidence depends on `$\alpha$` such that we have a `$(1 – \alpha)\%$` confidence interval:
    * If `$\alpha = 0.01$`, we have a `$(1 – 0.01)\% = 99\%$` confidence interval 
    * If `$\alpha = 0.05$`, we have a `$(1 – 0.05)\% = 95\%$` confidence interval 
    * If `$\alpha = 0.1$`, we have a `$(1 – 0.1)\% = 90\%$` confidence interval 
--
* Consider the following, more general definition:

.content-box-blue[
.center[
**Confidence interval calculation for general significance level:** 
]
`$$\bar{x} \pm t_{\text{df,}1 - \alpha/2}\times\text{se},$$`
]

where:

* `$t_{\text{df,}1 - \alpha/2}$` is the value from the `$t_{\text{df}}$` distribution such that `$P(T \leq t_{\text{df,}1 - \alpha/2}) = 1 - \alpha/2$`, i.e. the `$(1 - \alpha/2)$`th quantile
* `$\text{se}$`, the standard error, is equal to `$\frac{s}{\sqrt{n}}$`.
  
---

# Sleep Data Example: 95% CI Calculations

Recall for our sleep data example that we previously stated:

*The 95% confidence interval is (367.67, 445.66)*.

This means we are 95% confident that the population mean sleep for this population lies within the interval (367.67, 445.66).

Recalling the previous information:

* `$\bar{x} = 406.6667$`
  * `$n = 24$`
  * `$\text{df} = n - 1 = 24 - 1 = 23$`
  * Sample standard deviation: `$s = 92.3447$`
  * `$\text{se} = \frac{s}{\sqrt{n}} = \frac{92.3447}{\sqrt{24}} = 18.8498$`

If we also have `$t_{23,0.975} = 2.0687$`  (obtained from jamovi/R) we can calculate this 95% confidence interval as follows.

---

# Sleep Data Example: 95% CI Calculations

Recall that a `$$95\% \, \text{ CI } = \bar{x} \pm t_{\text{df,}0.975}\times\text{se}.$$`

First, we have that `$t_{23,0.975} \times \text{se} = 2.0687 \times 18.8498= 38.9946$`.

Next, we can add and subtract this number from `$\bar{x} = 406.6667$` to calculate our confidence interval as follows:

* `$406.6667 - 38.9946 = 367.6721$`
* `$406.6667 + 38.9946 = 445.6613$`,

for a 95% CI of (367.67, 445.66), rounded to two decimal places.

<br>

*See if you can calculate a 90% and also a 99% confidence interval yourself, using the above process. To do so, you will need to use the following for the 90% CI: `$t_{23,0.95} = 1.7139,$` and for the 99% CI: `$t_{23,0.995} = 2.8073.$` Which CI do you think will be wider?*

---

# Using Confidence Intervals to decide whether to reject `$H_0$`

So far, we have seen how to use the ***p-value approach*** and the ***critical region approach*** to decide whether or not to reject `$H_0$`.

We can also use the ***confidence interval approach***

* For the `$t$`-test, all three approaches should lead to the same conclusion

To use the confidence interval approach, we use the following rule:

* If `$\mu_0$` lies *outside* the range of the confidence interval, reject `$H_0$`

* If `$\mu_0$` lies *within* the range of the confidence interval, do not reject `$H_0$`

---

# Sleep Example: Reject `$H_0$` via CI Approach

Considering the sleep data example, we have that `$\mu_0 = 480$` and our 95% confidence interval is (367.67, 445.66).

Since `$\mu_0 = 480$` lies *outside* the range of the confidence interval, we decide to reject `$H_0$`.

Intuitively, this should make sense, because based on the confidence interval, we are saying we are 95% confident `$\mu$` is between 367.67 and 445.66. Since `$\mu_0 = 480$` is not within this range, it is extremely unlikely that `$\mu = 480$`.

Therefore it makes sense to say we are confident that `$\mu \neq \mu_0$`, and we can therefore reject the null hypothesis that `$\mu = \mu_0$`.

---
# One-sample `$t$`-test Assumptions

For our sleep data example, we conducted a one-sample `$t$`-test to determine if `$\mu = 480$` minutes.

For this one-sample `$t$`-test to be valid, we need to make some ***assumptions***:

.content-box-blue[
.center[
**One-sample t-test Assumptions:**
]
1. The data are numeric
2. Observations are independent of one another (that is, the sample is a simple random sample and each individual within the population has an equal chance of being selected)
3. The sample mean, `$\overline{X}$`, is normally distributed.
]

* In this subject, we will usually assume that the first two assumptions have been met

* Our focus here will therefore be ***checking the normality assumption***

---

# Checking whether the underlying distribution is normal

There are three main ways to test for normality:

1. Viewing the data in a histogram with a normal or density curve overlaid

1. Checking a Normal Q-Q plot

1. Carrying out a hypothesis test for normality

---
# Normality Check 1: Histogram

* The histogram looks approximately bell-shaped

* However, with a relatively small sample size it is difficult to draw a strong conclusion from the histogram alone

---
# Normality Check 2: Normal Q-Q plot

* A Normal Q-Q (Quantile-Quantile) plot is another graphical method we can use to check for normality.

* Although we will not go into detail here, this plot compares the sample data quantiles to the normal distribution quantiles

* The main thing to look for is how well the dots follow the diagonal straight line in the plot

* For the data to be considered normally distributed, the dots should follow the line as closely as possible

---

# Normality Check 2: Normal Q-Q plot

* The dots follow the diagonal line fairly well, however there is some abnormality in the middle and this may be partly due to the relatively small sample size

---
# Normality Check 3: Shapiro-Wilk Normality Test

We can conduct a hypothesis test to check for normality, known as the **Shapiro-Wilk test** (SW), which has the following null and alternative hypotheses:

* `$H_0:\text{The data are normally distributed }$`
* `$H_1:\text{The data are not normally distributed}$`

Since we start out by assuming the data are normally distributed, the test tells us to only reject this assumption if we get a small `$p$`-value. That is, ***for the Shapiro-Wilk normality test, a small p-value indicates the data are not normally distributed***. To summarise:

.content-box-blue[
.center[
**SW Hypothesis test for normality:**
]
* If *p* < 0.05, normality cannot be assumed
* If *p* > 0.05, normality can be assumed
]

---

# Normality Check 3: Shapiro-Wilk Normality Test

Note the following:

* If the sample size is very small, the SW test may fail to pick up non-normality

* If the sample size is large (e.g. 100 or more), the test may become too sensitive and indicate non-normality when the data are in fact normal

Let's carry out the Shapiro-Wilk test for normality for the sleep data example:

```

Shapiro-Wilk normality test

data:  sleep$sleep1
W = 0.96228, p-value = 0.4858
```
As we can see, we have `$p = 0.4858$`. Since `$p > 0.05$`, ***normality can be assumed.***

---

# Normality Check 3: Applying the Central Limit Theorem

Recall the third assumption for the `$t$`-test:

* The sample mean, `$\overline{X}$`, is normally distributed.

This means that even if we find that the underlying distribution is not normal, it may be that the distribution of the ***sample mean*** is still normal.

Recall that, as long as `$n \geq 30$`, we can apply the Central Limit Theorem and conclude that the distribution of the sample mean is normal.

This should however be done with caution - see [this topic's readings](https://amandashaker-stm1001-topic-5.share.connect.posit.cloud/4.2-how-the-central-limit-theorem-applies.html) for further discussion.

*As we have `$n = 24$`, we cannot apply the Central Limit Theorem in this example, so should rely on the SW result.*

---
# Normality Check Summary

For the purposes of this subject, use the following rules to guide the decision as to whether or not the normality assumption has been violated:

.content-box-blue[
.center[
**Normality Assumption Decision:**
]
* If the underlying distribution is normal, then the distribution of the sample mean will also be normal. This means the normality assumption has not been violated and the `$t$`-test can be used. This is regardless of sample size.

* If the underlying distribution is not normal but `$n \geq 30$`, then the distribution of the sample mean will be at least approximately normal. This means the normality assumption has not been violated and the `$t$`-test can be used.

* If the underlying distribution is not normal and `$n < 30$`, then we cannot assume that the distribution of the sample mean is normal. This means the normality assumption has been violated and we should not use the `$t$`-test.
]

---

# Normality Check Summary

Since the Shapiro-Wilk test indicated the underlying distribution was normal, and the histogram and Q-Q plots did not indicate strong violations against normality, we conclude that the normality assumption is **not violated**.

* In other words, since the underlying distribution is normal, we conclude that the distribution of the sample mean will also be normal

* This means the normality assumption has not been violated and the results of our `$t$`-test are valid

#### What happens when the normality assumption has been violated?

* If the normality assumption has been violated, we can use what is called a 'non-parametric' test

* That is, a test that does not make any assumptions about the underlying distribution of the data

* However, these types of tests are beyond the scope of this subject.

---

background-image: url(data:image/png;base64,#computerlab.jpg)
background-position: bottom
background-size: 75%
class: center

# See you in the computer labs!

---
class: middle

<font color = "grey">
These notes have been prepared by Amanda Shaker and Rupert Kuveke. The copyright for the material in these notes resides with the authors named above, with the Department of Mathematics and Statistics and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License 
<a href = "https://creativecommons.org/licenses/by-nc-nd/4.0/" target="_blank"> BY-NC-ND. </a>
</font>