STM1001 Topic 5 Lecture

class: middle
background-image: url(data:image/png;base64,#LTU_logo.jpg)
background-position: top left
background-size: 30%

# STM1001 [Topic 5](https://bookdown.org/content/50a3178d-5432-44a3-bb5c-1718ca3e1fe2/) Lecture
## Hypothesis Testing
### La Trobe University
This lecture complements the [Topic 5 readings](https://bookdown.org/content/50a3178d-5432-44a3-bb5c-1718ca3e1fe2/)

---

# Topic 5: Related Links

## Readings

[Topic 5 readings](https://bookdown.org/content/50a3178d-5432-44a3-bb5c-1718ca3e1fe2/)

## Notation

[Topics 5 and 6: Hypothesis testing and `$t$`-tests](https://bookdown.org/a_shaker/STM1001_Topic_0/notation-summary.html#topics-5-and-6-hypothesis-testing-and-t-tests)

---

# Topic 5: Hypothesis Testing

**Overview**

---

# Hypothesis Testing

In this lecture we will introduce ***Hypothesis Testing***, which is a famous statistical process and a cornerstone of modern scientific research.

We will cover:

* Hypothesis testing steps and structure

* Type I and Type II errors

* Conducting a `$t$`-test

* The `$t$`-distribution

* Confidence Intervals

* `$t$`-test assumptions and checks

---

# Hypothesis Testing

In the context of Statistics, a ***hypothesis*** is a specific statement of prediction.

It describes in concrete terms (rather than abstract terms) what you expect will happen in your study.

In statistical hypothesis testing, we consider two options:

* A ***Null Hypothesis***, denoted `$(H_0)$` and 
  
--

* An ***Alternate Hypothesis***, denoted `$(H_1)$`

We always begin our testing with the assumption that our null hypothesis is correct.

* The null hypothesis is typically based on existing research, while the alternate hypothesis considers the possibility that `$H_0$` may be inaccurate

Our intention is to use our sample data to test if the alternate hypothesis provides a more accurate description of the phenomenon of interest.

---

# Hypothesis Testing

In general terms, suppose we have a sample of `$n$` observations that have been independently sampled from a population with population mean `$\mu$`.

We can write out our ***Null Hypothesis*** `$(H_0)$` and ***Alternate Hypothesis*** `$(H_1)$` in the general form

`$$H_0:\mu = \mu_0\;\;\text{versus}\;\;H_1:\mu \neq \mu_0,$$`

Here `$\mu_0$` denotes the population mean under the null hypothesis.

* `$\mu_0$` will take a specific value (i.e. be a number) and is context-specific

When we carry out a hypothesis test, to start out with, we ***assume the null hypothesis to be true***.

If our sample provides evidence that this was not a reasonable assumption to make, then we ***reject the null hypothesis*** and therefore have evidence in favour of the ***alternative hypothesis***.

---

# The Hypothesis Testing Process

We can summarise the hypothesis testing process into 7 main steps:

1. Establish the Null Hypothesis `$H_0$` and the Alternate Hypothesis `$H_1$`
  
--

2. Choose the level of significance `$\alpha$`
  
--

3. Determine the appropriate test to use

4. Gather sample data
  
--

5. Analyse sample data
  
--

6. Reach a statistical conclusion (Reject `$H_0$` or Fail to reject `$H_0$`)
  
--

7. Write a clear conclusion

---

# Type I and Type II Errors

Regardless of the type of test we conduct, using a sample to infer details about a population can have risks of which we need to be aware:

.content-box-blue[
.center[

A ***Type I Error (aka a False Positive)*** occurs when the information from our sample data leads us to reject `$H_0$`, when in reality `$H_0$` is actually true.

<br>

A ***Type II Error (aka a False Negative)*** occurs when the information from our sample data leads us to not reject `$H_0$`, when in reality `$H_0$` is actually false.
]]

We can think of the ***level of significance***, denoted by the Greek letter `$\alpha$`, as being the threshold for our maximum accepted level of risk of incurring a Type I error.

* In other words, `$\alpha$` is the threshold for the maximum acceptable probability of making a Type I error when conducting a test
  
--

* We don't want to make an incorrect inference, so `$\alpha$` should be small!

---

# Type I and Type II Errors

* These errors can have serious implications
    - imagine a test for a disease; which error would be worse to make?

* The lower your specified `$\alpha$` value, the more likely it is that Type II errors will occur (there is always a cost)
    
---

# SD & Variance - Population vs Sample

Recall that for a random variable `$X$`, we have:

|           |  Standard Deviation| Variance |
|:----------|:---------:|:-------------------:|
|**Population**  | `$\sigma$` | `$\sigma^2$`        |
|     vs         |          |                   |
|**Sample**      | `$s$`      | `$s^2$`             |

* Note we distinguish between population and sample by using Greek and English letters respectively

For the sample mean `$\overline{X}$`, which itself is a random variable, we have:

|               | Standard Deviation  | Variance |
|:--------------|:-------------------:|:--------:|
|**Population** | `$\displaystyle\frac{\sigma}{\sqrt{n}}$`  | `$\displaystyle\frac{\sigma^2}{n}$` |   
|     vs          |                                         |                                   |
|**Sample**     | `$\displaystyle\frac{s}{\sqrt{n}}$`       | `$\displaystyle\frac{s^2}{n}$`      |

---

# Sample Data -> Inference

In applied statistical *inference*, generally we are using sample data to *infer* something about a population of interest.

Our motivation for conducting a hypothesis test is to determine:

* Given our sample data, is `$H_0$` likely to be accurate/true?
  
--

To answer this question, we will need to conduct a formal statistical test (with several steps), using descriptive statistics from our data, e.g.:

* the observed sample mean `$\overline{x}$`, 
  
  * the observed sample standard deviation `$s$`,
  
  * the sample size `$n$`

We will use these values, along with our specified `$\mu_0$` value for our parameter of interest under `$H_0$`, to compute a ***Test Statistic***.

---

# Test Statistics

**Test statistics** are random variables which we utilize during the Hypothesis Testing Process when trying to decide if we can reject `$H_0$`.

Different statistical tests will have different test statistic equations and details, but the way in which we use test statistics is typically *test-agnostic*.

What we will focus on here is an overview on:

* ***Why*** we use test statistics
  
  * ***How*** they tie in with our hypothesis testing, and 
  
  * ***How to use them for statistical inference***, with an example for a specific statistical test

---

# Test Statistic Notation

In general terms, for the type of tests we'll consider now, the test statistic is defined as follows:

$$T = \displaystyle \frac{\overline{X} - \mu_0}{\text{SE}} = \frac{\overline{X} - \mu_0}{S/\sqrt{n}}, $$

where:

* `$T$` denotes the ***test statistic***, which is ***random***. That is, it is a ***random variable*** and will follow a specific statistical distribution, the `$t$`-distribution, with `$T \sim t_{df}$` (more on this shortly).

* `$\overline{X}$` denotes the sample mean

* `$\text{SE}$` refers to the ***Standard Error***. The standard error is an *estimator* of the *standard deviation of the sample mean*, and is equal to `$S / \sqrt{n}.$`

---

# The observed test statistic

If we input our specific data from our study into the test statistic equation, we will obtain the ***observed test statistic***:

$$t = \displaystyle \frac{\bar{x} - \mu_0}{\text{se}} = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}, $$

where:

* `$t$` denotes the observed ***test statistic*** - this is a number, which we consider to be sampled from the test statistic distribution

* `$\bar{x}$` denotes the observed sample mean

* `$\text{se}$` refers to the ***observed standard error***. The standard error is an *estimate* of the *standard deviation of the sample mean*, and is equal to `$\frac{s}{\sqrt{n}}$`.

Note the difference in notation between the ***random*** and ***observed*** test statistic definitions where, for example, `$T$` is the ***random*** test statistic, and `$t$` is the ***observed*** test statistic.

---

# Test Statistic Interpretation

The distribution for our test statistic is a standardized version of the distribution for the sample mean `$\overline{X}$`.

***Observed test statistics*** condense information from our sample data into a single value which we can use to help decide if we should reject or not reject `$H_0$`.

***Observed test statistics*** are similar to the `$z$`-scores we learnt about in Topic 3, in that an ***observed test statistic*** can be thought of as a ***standardised*** version of the observed sample mean.

For the time being, it may help to think of the ***observed test statistic*** as being a bit like a ***smoothie***.

* It is created using multiple ingredients (e.g. `$\overline{x}$`, `$s$`, `$n$`, `$\mu_0$`), and assuming `$H_0$` is correct, we are expecting it to taste a certain way/be close to a certain value

---

# Test Statistic Interpretation

The ***test statistic*** will follow a specific probability distribution under `$H_0$`.

* The specific distribution will depend on the test being conducted

Using our observed test statistic and the distribution of the test statistic, we can calculate a special probability known as the ***p-value*** to help us decide if our null hypothesis is accurate or not.

Recall that `$\alpha$` represents the threshold for the maximum acceptable level of risk of incurring a Type I error.

If our test statistic is an extreme value, such that the resultant ***p-value is less than alpha***, we can conclude that we have a ***statistically significant*** result and reject `$H_0$`; otherwise our result is not statistically significant, and we cannot reject `$H_0$`.

We'll look at this in more detail shortly, but this is our end goal - to obtain a `$p$`-value so we can reach a statistically-informed decision regarding our null hypothesis.

---

# Distribution of the Test Statistic

The optimal distribution to use for our test statistic would be the ***Normal Distribution***, due to its appealing qualities, including:

* It is symmetric
  
--

* We only need two parameters to define the shape of the distribution - the mean `$\mu$` and the variance `$\sigma^2$`

* We can easily standardize results for easier inference and comparison between data sets
  
--

* Based on the ***Central Limit Theorem (CLT)***, we can assume that ***the distribution of the sample mean*** will follow a normal distribution, so long as our sample size is large `$(n \geq 30)$`
 
--

Unfortunately, in order to specify the shape of the normal distribution  we need to know the population mean and population variance - generally, these are unknown and we will not be able to obtain them...

* We can estimate the population standard deviation `$\sigma$` using our sample standard deviation `$s$`, but we need to account for the extra uncertainty this introduces

---

# The `$t$`-distribution

Fortunately, we can use the `$t$`-distribution as a replacement for the normal distribution!

The shape of the `$t$`-distribution is defined by a parameter called the ***degrees of freedom (df)***, which is related to the ***sample size n***.

* So, unlike the normal distribution, all we need to define the `$t$`-distribution is the sample size `$n$`, which we should always know

* For `$t$`-tests, we have `$T \sim t_{df}$`, with `$df = n-1$`

As `$df \rightarrow \infty$`, the `$t$`-distribution converges to the normal distribution.

* This means we can use the `$t$`-distribution as a replacement to the normal distribution, as long as `$n$` is large

---

---

# The `$t$`-distribution

.pull-left[

* For `$t$`-tests, we use the `$t$`-distribution as the distribution for our sample mean
{{content}}
]

* We treat our test statistic as an observed value from this distribution
{{content}}

* For small sample sizes, the `$t$`-distribution does not match the normal distribution and is more conservative, making it harder to obtain statistically significant results (check thickness of tails)

.pull-right[
<img src="data:image/png;base64,#Topic_5_Lecture_files/figure-html/unnamed-chunk-4-1.svg" style="display: block; margin: auto;" />

]

---

# Sleep Data Example

Suppose we made a claim that we believed that on average, STM1001 students spend a total of 480 minutes (8 hours) sleeping per day.

* We can test this claim via a hypothesis test

Suppose we asked students the research question *'In the past 24 hours, how many minutes did you spend sleeping?'* and obtained `$n=24$` responses, summarised in the histogram below:

.pull-left[
<img src="data:image/png;base64,#Topic_5_Lecture_files/figure-html/unnamed-chunk-5-1.svg" width="85%" style="display: block; margin: auto;" />
]

.pull-right[
* Sample mean: `$\overline{x} = 406.7$`

* Sample standard deviation: `$s = 92.3$`

* Is `$\mu_0 = 480$` an appropriate estimate for the population mean?

* We can use our sample of data to test our claim and reach a conclusion
]

---

# Hypothesis Testing Example

For our ***sleep data example***, we would define the null and alternative hypotheses as follows:

`$$H_0:\mu = 480\;\;\text{versus}\;\;H_1:\mu \neq 480,$$`
where:

* `$\mu$` denotes the population mean number of minutes people in the population (STM1001 students) spend sleeping per day

* `$H_0$` denotes the null hypothesis that the mean number of minutes people in this population spend sleeping is ***equal*** to 480

* `$H_1$` denotes the alternative hypothesis that the mean number of minutes people in this population spend sleeping is ***not equal*** to 480

* `$\mu_0 = 480$`

To test this hypothesis, we can conduct a `$t$`-test (called thus since we are using a test statistic that follow a `$t$`-distribution).

* Since we're assessing one group of individuals, with one sample taken from each individual, we call our test a ***one-sample t-test***

---

# t-distribution for Sleep Data Example

Recall that the `$t$`-distribution is the standardized distribution of the sample mean, and is created under the assumption that `$H_0$` is true.

If `$H_0$` is true, the sample mean will equal `$\mu_0$`, so our value of `$\mu_0$` corresponds to `$t = 0$`, given that `$T = \frac{\overline{X} - \mu_0}{S/\sqrt{n}}$`.

---

# Sleep Data Example test statistic

If our sample mean is a lot higher than `$\mu_0 = 480$`, then our observed test statistic will be above zero – e.g. maybe 1, 1.4, 3, or even higher.

If our sample mean is a lot lower than `$\mu_0 = 480$`, then our observed test statistic will be below zero – e.g. maybe -2, -3, or even lower.

If our sample mean is far from `$\mu_0$`, this suggests that `$H_0$` is not correct.

Similarly, ***if our observed test statistic is far from 0, this suggests that `$H_0$` is not correct, and should be rejected.***

---

# Observed test statistic inference

Let’s assume we observe a sample mean much higher than 480, and which translates to an observed test statistic of `$t =3$`.

* Would it be reasonable to hold onto the assumption that the true average is 480 minutes? I.e., does the null hypothesis seem correct?

* Probably not. In this case, we would say we **reject the null hypothesis** and that we have evidence that the true mean is different from 480 minutes.

---

# Observed test statistic inference

In general, we want to ask the question:

*If the null hypothesis is true, what is the probability we would have obtained a result at least as extreme as our observed test statistic?*

This probability is the ***p-value*** value mentioned earlier.

Mathematically, the ***p-value*** equals `$P(T \geq |t|)$` for our one-sample `$t$`-test scenario.

* If the p-value is *less than* our level of significance alpha, this is sufficient evidence to reject `$H_0$`

* If the p-value is *greater than* our level of significance alpha, then we do not have enough evidence to reject `$H_0$`

* Note that we don't say `$H_0$` is true, but rather that ***we do not reject*** `$H_0$`. This is because perhaps with more data, we may reach a different conclusion.

Typically, `$\alpha = 0.05$` is used.

---

# Sleep Data Example: `$t$`-test output

``` r
	One Sample t-test

data:  sleep$sleep1
 t = -3.8904, df = 23, p-value = 0.0007382
alternative hypothesis: true mean is not equal to 480
95 percent confidence interval:
 367.6729 445.6604
sample estimates:
 mean of x 
 406.6667 
```

---
# Sleep Data Example: `$t$`-test output

``` r
	One Sample t-test

data:  sleep$sleep1
 `t = -3.8904`, df = 23, p-value = 0.0007382
alternative hypothesis: true mean is not equal to 480
95 percent confidence interval:
 367.6729 445.6604
sample estimates:
 mean of x 
 406.6667 
```

* The **observed test statistic** is `$t = -3.8904$`

---

# Sleep Data Example: `$t$`-test output

``` r
	One Sample t-test

data:  sleep$sleep1
 `t = -3.8904`, `df = 23`, p-value = 0.0007382
alternative hypothesis: true mean is not equal to 480
95 percent confidence interval:
 367.6729 445.6604
sample estimates:
 mean of x 
 406.6667 
```

* The **observed test statistic** is `$t = -3.8904$`
* The **degrees of freedom** is `$n - 1 = 24 - 1 = 23$`

---

# Sleep Data Example: `$t$`-test output

``` r
	One Sample t-test

data:  sleep$sleep1
 `t = -3.8904`, `df = 23`, `p-value = 0.0007382`
alternative hypothesis: true mean is not equal to 480
95 percent confidence interval:
 367.6729 445.6604
sample estimates:
 mean of x 
 406.6667 
```

---

# Sleep Data Example: `$t$`-test output

``` r
	One Sample t-test

---

# Sleep Data Example: `$t$`-test output

``` r
	One Sample t-test

data:  sleep$sleep1
 `t = -3.8904`, `df = 23`, `p-value = 0.0007382`
alternative hypothesis: true mean is not equal to 480
 `95 percent confidence interval:`
 `367.6729 445.6604`
sample estimates:
 `mean of x `
 `406.6667 `
```

* The **observed test statistic** is `$t = -3.8904$`
* The **degrees of freedom** is `$n - 1 = 24 - 1 = 23$`
* The ** `$p$`-value** is `$p = 0.0007$`. Since this `$p$`-value is less than 0.05, we can reject `$H_0$` and conclude that the average number of minutes people spend sleeping in this population is ***different*** from `$\mu_0 = 480$` 
* The **sample mean** is `$\bar{x} = 406.67$`
* The **95% confidence interval** is (367.67, 445.66). This means we are 95% confident that the population mean sleep for this population lies within the interval (367.67, 445.66).
--

---

# Sleep Data Example: observed test statistic

Recall that the **sample mean** was `$\bar{x} = 406.67$` and `$\mu_0 = 480$`.

* Since the sample mean is different from 480, it may be that we will have a significant result

The test statistic allows us to evaluate exactly how ***extreme*** this result is after taking into consideration the variability and sample size.

Recall the formula for the ***observed test statistic*** is

`$$t = \displaystyle \frac{\bar{x} - \mu_0}{\text{se}} = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}$$`

* Also recall that `$s = 92.3$` and `$n = 24$`. We therefore have that

$$ t = \frac{406.67 - 480}{92.3/\sqrt{24}} = \frac{-73.33}{18.85} \approx -3.89.$$
--

As the test statistic is not close to zero (or its absolute value is relatively large), we expect that the corresponding `$p$`-value will be small.

---
# Sleep Data Example: The `$p$`-value

For our sleep data example, the `$p$`-value we obtain allows us to answer the following question:

If it were true that `$\mu = \mu_0 = 480$`, what are the chances that, when we took our sample of `$n = 24$` students, we would have seen either this sample mean of `$\bar{x} = 406.67$` (which translates to an observed test statistic of `$t = -3.89$`), ***or a more extreme result***?

* Is our observed test statistic ***extreme*** in the context of the above `$t$`-distribution (which assumes `$H_0$` is true)? Let's have a look:

---

# Sleep Data Example: The `$p$`-value

---

# Sleep Data Example: The `$p$`-value

As it turns out, our observed test statistic is fairly extreme in the context of this distribution, because the probability of observing this test statistic is only `$p = 0.0007$` ***if `$H_0$` is true***.

In other words:

`$$P(T \geq |t|) = P(T \leq -3.8904) + P(T \geq 3.8904) = 0.0007.$$`

Recall this probability is our ** `$p$`-value**.

Note that here we were only interested in whether `$t$` ***different from*** `$0$` (or equivalently whether `$\mu$` was ***different from*** 480).

* I.e. what is the probability of seeing a test statistic at least as extreme ***in either direction***?

* That is, greater than 3.8904 or less than -3.8904. This is called a ***two-sided test***. This point will be further explained shortly.

---

# The `$p$`-value & the significance level `$\alpha$`

Since our `$p$`-value was small, we had enough evidence to ***reject `$H_0$`***.

* Therefore, we have evidence to support the alternative hypothesis that `$\mu \neq 480$`, i.e. that our result is ***statistically significant***.

You may be wondering:

*How small does our `$p$`-value need to be for us to decide that the observed test statistic is extreme enough for us to reject `$H_0$`?*

* This is where the ***level of significance, `$\alpha$`*** ties in to our Hypothesis Testing Process. In general, the standard level of significance is `$\alpha = 0.05$`, although other levels of `$\alpha$` can be chosen.

Our overall hypothesis test conclusion is based on comparing our `$p$`-value against our ***level of significance, `$\alpha$`***. This is called the ***p-value approach***.

That is:

* if `$p < \alpha$`, we reject `$H_0$`
    
  * if `$p > \alpha$`, we do not reject `$H_0$`

---

# The Critical Region approach

There is another method we could also use to reach an overall hypothesis test conclusion, called the ***Critical Region approach***.

Consider the question:

*If `$\alpha = 0.05$`, how extreme would our test statistic need to be in order to reject `$H_0$`?*

To answer this question, we can find the ***quantiles*** such that `$P(T \geq |t|) = P(T \leq t) + P(T \geq t) = 0.05$` as represented on the next slide.

---

* As we can see, `$P(T \leq -2.07) + P(T \geq 2.07) = 0.05$`.

* This means that if our test statistic was either greater than 2.07 or less than -2.07,  we would say it falls in the ***critical region*** and we would reject `$H_0$`, because any value of `$t$` within this range would result in `$p < 0.05$`.

---

name: menti
class: middle
background-image: url(data:image/png;base64,#menti.jpg)
background-size: 115%

# Kahoot

## Go to [www.kahoot.it](https://www.kahoot.it) and use

## the code provided

---

# One-sided vs two-sided tests

Earlier, we mentioned that the test we had carried out was a ***two-sided test***. ***One-sided tests*** are also possible. Consider the following cases:

1. **Two-sided test**: *Is the average sleep per day for students from this population **different** from 480 minutes?*

+ `$H_0:\mu = 480\;\;\text{versus}\;\;H_1:\mu \neq 480$`
   
--

2. **One-sided test**: *Is the average sleep per day for students from this population **greater than** 480 minutes?*

+ `$H_0:\mu = 480\;\;\text{versus}\;\;H_1:\mu > 480$`

3. **One-sided test**: *Is the average sleep per day for students from this population **less than** 480 minutes?*

+ `$H_0:\mu = 480\;\;\text{versus}\;\;H_1:\mu < 480$`
 
--
   
Examples 2 and 3 above are referred to as 'one-sided tests' because they are only testing for extreme values in one direction. Consider the figure on the next slide, which shows the critical values (CV) required for each test.

---
# One-sided vs two-sided tests

---
# One-sided vs two-sided tests

For a **two-sided test**, we have `$p\text{-value} = 2 \times P(T\geq |t|) \text{ for } T\sim t_{\text{df}}$`.

* Since the combined area of the shaded area must equal `$\alpha$`, we have an area of `$\alpha / 2$` at each tail.

--
1. **The two-sided test** is often referred to as a **two-tailed test**, as we are interested in extreme values in the two 'tails' of the distribution curve.

2. For **one-sided (right-tailed) tests** we are only interested in extreme values in the right tail (i.e. greater than `$\mu_0$`) so we have an area of `$\alpha$` in the right tail. For a **one-sided test (right-tailed)**, we have:
   `$$p\text{-value} = P(T\geq t) \text{ for } T\sim t_{\text{df}}$$`
   
--

3. Similarly, for **one-sided test (left-tailed)**, we have:
   `$$p\text{-value} = P(T\leq t) \text{ for } T\sim t_{\text{df}}$$`

Two-sided tests are often preferred in practice because they are unbiased in terms of the predicted direction of the results. However, in this subject we will practice using both two-sided and one-sided tests.

---

# Confidence Intervals

As well as reporting the observed sample mean, reporting a ***confidence intervals*** (CI) is very useful. We can use a CI to:

* Provide a range within which the true population mean is likely to be

* Indicate how ***confident*** we are in our estimate

Consider the following statements:

1. *We are 95% confident that the true mean minutes STM1001 students spend sleeping each day is **between 400.67 and 412.67 minutes.***

2. *We are 95% confident that the true mean minutes STM1001 students spend sleeping each day is **between 6.67 and 806.67 minutes.***

Which statement do you prefer, and why?

---

# Confidence Intervals

In our sleep data example, our estimated mean time sleeping is 406.67 minutes.

The first CI was more useful than the second CI, because it was much narrower.

* For the same specified `$\alpha$`, a narrower confidence interval is preferable to a wider confidence interval

The width of a confidence interval is determined by:

--
  * The sample size `$n$`
    
--

* The estimated variability in the sample `$s$`
    
--

* The level of significance `$\alpha$`

---

# Confidence Intervals

To calculate a confidence interval, we take `$\bar{x}$` and then add and subtract some **margin of error**. Consider the following definition:

.content-box-blue[
.center[
**95% Confidence interval calculation:**
]
`$$95\% \, \text{ CI } = \bar{x} \pm t_{\text{df,}0.975}\times\text{se},$$`
]

where:

* `$t_{\text{df,}0.975}$` is the value from the `$t_{\text{df}}$` distribution such that `$P(T \leq t_{\text{df,}0.975}) = 0.975$`, i.e. the 0.975th quantile

* `$\text{se}$`, the standard error, is equal to `$\frac{s}{\sqrt{n}}$`

---

# Confidence Intervals

Consider the following, more general definition:

.content-box-blue[
.center[
**Confidence interval calculation for general significance level:** 
]
`$$(1 - \alpha)\times 100\% \, \text{ CI } = \bar{x} \pm t_{\text{df,}1 - \alpha/2}\times\text{se},$$`
]

where:

* `$t_{\text{df,}1 - \alpha/2}$` is the value from the `$t_{\text{df}}$` distribution such that `$P(T \leq t_{\text{df,}1 - \alpha/2}) = 1 - \alpha/2$`, i.e. the `$(1 - \alpha/2)$`th quantile

Our level of confidence depends on `$\alpha$` such that we have a `$(1 – \alpha) \times 100\%$` confidence interval. E.g.:

* If `$\alpha = 0.01$`, we have a `$(1 – 0.01)\times 100\% = 99\%$` confidence interval 
    
--

* If `$\alpha = 0.05$`, we have a `$(1 – 0.05)\times 100\% = 95\%$` confidence interval 
  
---

# Sleep Data Example: 95% CI Calculations

Recall for our sleep data example that we previously stated (slide 33):

*The 95% confidence interval is (367.67, 445.66)*.

This means we are 95% confident that the population mean sleep for this population lies within the interval (367.67, 445.66).

Recalling the previous information:

* `$\bar{x} = 406.6667$`
  * `$n = 24$`
  * `$\text{df} = n - 1 = 24 - 1 = 23$`
  * Sample standard deviation: `$s = 92.3447$`
  * `$\text{se} = \frac{s}{\sqrt{n}} = \frac{92.3447}{\sqrt{24}} = 18.8498$`

if we also have `$t_{23,0.975} = 2.0687$`  (obtained from jamovi/R) we can calculate this 95% confidence interval as follows.

---

# Sleep Data Example: 95% CI Calculations

Recall that a `$$95\% \, \text{ CI } = \bar{x} \pm t_{\text{df,}0.975}\times\text{se}.$$`

First, we have that `$t_{23,0.975} \times \text{se} = 2.0687 \times 18.8498= 38.9946$`.

Next, we can add and subtract this number from `$\bar{x} = 406.6667$` to calculate our confidence interval as follows:

* `$406.6667 - 38.9946 = 367.6721$`
* `$406.6667 + 38.9946 = 445.6613$`,

for a 95% CI of (367.67, 445.66), rounded to two decimal places.

<br>

*See if you can calculate a 99% confidence interval yourself, using the above process. To do so, you will need to use the following: `$t_{23,0.995} = 2.8073.$`*

---

# Using Confidence Intervals to decide whether to reject `$H_0$`

So far, we have seen how to use the ***p-value approach*** and the ***critical region approach*** to decide whether or not to reject `$H_0$`.

We can also use the ***confidence interval approach*** (this way is arguably easier).

* For the `$t$`-test, all three approaches should lead to the same conclusion

To use the confidence interval approach, we use the following rule:

* If `$\mu_0$` lies *outside* the range of the confidence interval, reject `$H_0$`

* If `$\mu_0$` lies *within* the range of the confidence interval, do not reject `$H_0$`

---

# Sleep Example: Reject `$H_0$` via CI Approach

Considering the sleep data example, we have that `$\mu_0 = 480$` and our 95% confidence interval is (367.67, 445.66).

Since `$\mu_0 = 480$` lies *outside* the range of the confidence interval, we decide to reject `$H_0$`.

* It's that simple!

Intuitively, this should make sense, because based on the confidence interval, we are saying we are 95% confident `$\mu$` is between 367.67 and 445.66. Since `$\mu_0 = 480$` is not within this range, it is extremely unlikely we would ever observe `$\mu_0 = 480$`.

Therefore it makes sense to say we are confident that `$\mu \neq \mu_0$`, and we can therefore reject the null hypothesis that `$\mu = \mu_0$`.

---
# One-sample `$t$`-test Assumptions

For our sleep data example, we conducted a one-sample `$t$`-test to determine if `$\mu = 480$` minutes.

For this one-sample `$t$`-test to be valid, we need to make some ***assumptions***:

.content-box-blue[
.center[
**One-sample t-test Assumptions:**
]
1. The data are numeric
2. Observations are independent of one another (that is, the sample is a simple random sample and each individual within the population has an equal chance of being selected)
3. The sample mean, `$\overline{X}$`, is normally distributed.
]

* In this subject, we will usually assume that the first two assumptions have been met

* Our focus here will therefore be ***checking the normality assumption***

---

# Checking whether the underlying distribution is normal

There are three main ways to test for normality:

1. Viewing the data in a histogram with a normal or density curve overlaid

1. Checking a Normal Q-Q plot

1. Carrying out a hypothesis test for normality

---
# Normality Check 1: Histogram

* The histogram looks approximately bell-shaped

* However, with a relatively small sample size it is difficult to draw a strong conclusion from the histogram alone

---
# Normality Check 2: Normal Q-Q plot

A Normal Q-Q (Quantile-Quantile) plot is another graphical method we can use to check for normality.

* Although we will not go into detail here, this plot compares the sample data quantiles to the normal distribution quantiles

* The main thing to look for is how well the dots follow the diagonal straight line in the plot

* For the data to be considered normally distributed, the dots should follow the line as closely as possible

---

# Normality Check 2: Normal Q-Q plot

* The dots follow the diagonal line fairly well, however there is some abnormality in the middle and this may be partly due to the relatively small sample size

---
# Normality Check 3: Shapiro-Wilk Normality Test

We can conduct a hypothesis test to check for normality, known as the **Shapiro-Wilk test** (SW), which has the following null and alternative hypotheses:

* `$H_0:\text{The data are normally distributed }$`

* `$H_1:\text{The data are not normally distributed}$`

Since we start out by assuming the data are normally distributed, the test tells us to only reject this assumption if we get a small `$p$`-value. That is, ***for the Shapiro-Wilk normality test, a small p-value indicates the data are not normally distributed***. To summarise:

.content-box-blue[
.center[
**SW Hypothesis test for normality:**
]
* If *p* < 0.05, normality cannot be assumed
* If *p* > 0.05, normality can be assumed
]

---

# Normality Check 3: Shapiro-Wilk Normality Test

Note the following:

* If the sample size is very small, the SW test may fail to pick up non-normality

* If the sample size is large (e.g. 100 or more), the test may become too sensitive and indicate non-normality when the data are in fact normal

Let's carry out the Shapiro-Wilk test for normality for the sleep data example:

```

Shapiro-Wilk normality test

data:  sleep$sleep1
W = 0.96228, p-value = 0.4858
```
As we can see, we have `$p = 0.4858$`. Since `$p > 0.05$`, ***normality can be assumed.***

---

# Normality Check 3: Applying the Central Limit Theorem

Recall the third assumption for the `$t$`-test:

* The sample mean, `$\overline{X}$`, is normally distributed.

This means that even if we find that the underlying distribution is not normal, it may be that the distribution of the ***sample mean*** is still normal.

Recall that, as long as `$n \geq 30$`, we can apply the Central Limit Theorem and conclude that the distribution of the sample mean is normal.

This should however be done with caution - see [this topic's readings](https://bookdown.org/content/50a3178d-5432-44a3-bb5c-1718ca3e1fe2/4.2-how-the-central-limit-theorem-applies.html) for further discussion.

*As we have `$n = 24$`, we cannot apply the Central Limit Theorem in this example, so should rely on the SW result.*

---
# Normality Check Summary

For the purposes of this subject, use the following rules to guide the decision as to whether or not the normality assumption has been violated:

.content-box-blue[
.center[
**Normality Assumption Decision:**
]
* If the underlying distribution is normal, then the distribution of the sample mean will also be normal. This means the normality assumption has not been violated and the `$t$`-test can be used. This is regardless of sample size.

* If the underlying distribution is not normal but `$n \geq 30$`, then the distribution of the sample mean will be at least approximately normal. This means the normality assumption has not been violated and the `$t$`-test can be used.

* If the underlying distribution is not normal and `$n < 30$`, then we cannot assume that the distribution of the sample mean is normal. This means the normality assumption has been violated and we should not use the `$t$`-test.
]

---

# Sleep Data Example: Normality Check Summary

Since the Shapiro-Wilk test indicated the underlying distribution was normal, and the histogram and Q-Q plots did not indicate strong violations against normality, we conclude that the normality assumption is **not violated**.

* In other words, since the underlying distribution is normal, we conclude that the distribution of the sample mean will also be normal

* This means the normality assumption has not been violated and the results of our `$t$`-test are valid

#### What happens when the normality assumption has been violated?

* If the normality assumption has been violated, we can use what is called a 'non-parametric' test

* That is, a test that does not make any assumptions about the underlying distribution of the data

* However, these types of tests are beyond the scope of this subject.

---

name: stat
class: middle
background-image: url(data:image/png;base64,#slide_1.png)
background-size: 110%

---

name: stat
class: middle
background-image: url(data:image/png;base64,#slide_8.png)
background-size: 100%

---

background-image: url(data:image/png;base64,#computerlab.jpg)
background-position: bottom
background-size: 75%
class: center

# See you in the computer labs!

---
class: middle

<font color = "grey">
These notes have been prepared by Amanda Shaker and Rupert Kuveke. The copyright for the material in these notes resides with the authors named above, with the Department of Mathematics and Statistics and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License 
<a href = "https://creativecommons.org/licenses/by-nc-nd/4.0/" target="_blank"> BY-NC-ND. </a>
</font>