STM1001 Topic 6 Lecture

class: middle
background-image: url(data:image/png;base64,#LTU_logo.jpg)
background-position: top left
background-size: 30%

# STM1001 [Topic 6](https://bookdown.org/content/f9d035ed-86ea-4779-ad01-31acc973f0dd/) Lecture
## `$t$`-tests for two-sample hypothesis testing
### La Trobe University
This lecture complements the [Topic 6 readings](https://bookdown.org/content/f9d035ed-86ea-4779-ad01-31acc973f0dd/)

---

# Topic 6: Related Links

## Readings

[Topic 6 readings](https://bookdown.org/content/f9d035ed-86ea-4779-ad01-31acc973f0dd/)

## Notation

[Topics 5 and 6: Hypothesis testing and `$t$`-tests](https://bookdown.org/a_shaker/STM1001_Topic_0/notation-summary.html#topics-5-and-6-hypothesis-testing-and-t-tests)

---

# Topic 6: `$t$`-tests for two-sample hypothesis testing

**Overview**

---

# Today's Lecture

Having learnt about the one-sample `$t$`-test in the previous topic, today we will be learning about two more types of `$t$`-tests:

* The ***independent samples t-test*** (or two-sample `$t$`-test)
    
--

* The ***paired t-test*** (or dependent samples `$t$`-test)

We will learn about these tests via examples, including the following steps:

* Visualising the data
    
--

* Checking the assumptions
    
--

* Carrying out the test
    
--

To conclude, we will discuss ***effect sizes***, which help to determine the relative ***size*** of any differences found (as distinguished from ***statistical significance***).

Remember, if you need to refresh your understanding of any Maths concepts, or notation introduced recently, you can check the [Maths and Notation Summary Guide](https://bookdown.org/a_shaker/STM1001_Topic_0/). 
---

name: stat
class: middle
background-image: url(data:image/png;base64,#slide_1.png)
background-size: 110%

---

name: stat
class: middle
background-image: url(data:image/png;base64,#slide_9.png)
background-size: 100%

---

# *t*-test versions

The `$t$`-test framework we introduced in the previous topic is very flexible, and can be adapted to a variety of scenarios.

There are three main versions of the `$t$`-test:

1. The ***one-sample t-test*** is used when we have one group, and assess one measurement from each individual in the group.

* Our focus is typically on comparing results from the group (e.g. the sample mean) to a fixed reference value

2. The ***independent samples t-test*** (or two-sample `$t$`-test) is used when we have two independent groups, and assess one measurement from each individual in each group.

* Our focus is typically on comparing the two groups to check for similarities or differences
    
--

3. The ***paired t-test*** (or dependent samples `$t$`-test) is used when we have taken two measurements of the same characteristic from each individual in a group, typically at two time points.

* Our focus is typically on comparing the two sets of observations

---

# Students' Eye Colour vs Sleep Time Example

Suppose we made the following claim:

*We believe that on average, STM1001 students with brown eyes spend either more or less time sleeping than people who do not have brown eyes*

Further suppose that a sample of STM1001 students have been asked:

* *In the past 24 hours, how many minutes did you spend sleeping?*, and

* *What is your eye colour?*

We can test this claim using a hypothesis test, just like we did with the sleep example in the previous lecture.

However, since we have two independent groups of individuals here (students with brown eyes and students who don't have brown eyes), we will need to conduct a new version of the `$t$`-test, known as ***the independent samples t-test*** (aka the two-sample `$t$`-test).

---
# Eye Colour/Sleep Example Hypotheses

First, we need to set up our hypotheses:

`$$H_0:\mu_1 = \mu_2\;\;\text{versus}\;\;H_1:\mu_1 \neq \mu_2,$$`
where:

* `$\mu_1$` denotes the population mean number of minutes STM1001 students with brown eyes spend sleeping per day

* `$\mu_2$` denotes the population mean number of minutes STM1001 students who do not have brown eyes spend sleeping per day

Note that, just like in the last topic, we are using `$H_0$` to denote the null hypothesis, and `$H_1$` to denote the alternate hypothesis.

Note that if `$\mu_1 = \mu_2$`, this means that the difference between `$\mu_1$` and `$\mu_2$` is zero. So the above hypothesis could equivalently be written as: `$$H_0:\mu_1 - \mu_2 = 0\;\;\text{versus}\;\;H_1:\mu_1 - \mu_2 \neq 0.$$`

---
# Independent samples `$t$`-test

What does it mean to have ***two independent groups***, which is a requirement of conducting an independent-samples `$t$`-test?

* One way of thinking of it would be that individuals can only be in one group or the other; individuals cannot be in both groups simultaneously

* E.g. for this example, we assume a person belongs to the 'brown eyes' or 'other' group, but not both

* This means the two groups are ***independent***, and appropriate for the independent-samples `$t$`-test

---
# Variables: Independent samples `$t$`-test

What type of variables are required for the independent samples `$t$`-test?

.content-box-blue[
.center[
An **independent samples** *t*-test will always involve two variables:
]
1. The ***dependent*** variable, sometimes also called the *response* variable. This should be a numeric, continuous variable.

2. The ***independent*** variable.
  This should be a categorical variable with only ***two categories***.
]

* So our ***dependent*** variable is minutes of sleep

* Our ***independent*** variable is eye colour

---
# Assumptions: Independent samples `$t$`-test

The  independent samples `$t$`-test assumptions are similar to those of the one-sample `$t$`-test, *with one addition*: ***equal variances*** between groups, also known as ***homogeneity of variance***.

.content-box-blue[
.center[
**Independent samples *t*-test Assumptions:**
]
1. The data are numeric.
2. Observations are independent of one another 
  * (that is, the sample is a simple random sample and each individual within the population has an equal chance of being selected)
3. The sample mean, `$\overline{X}$`, is normally distributed.
4. The two groups have equal variances
  * We can check this assumption using a statistical test that we will cover in more detail shortly
]

* Normally, if the sd of one group is more than twice the value of the other group's sd, the equal variance assumption has been violated

---
# Data Visualisation & Assumption Checking

Before carrying out a hypothesis test, it is always a good idea to look at some descriptive statistics and plots.

* This give us an idea what to expect when we carry out the test, and also check the assumptions

|               |                |         |       |
|:--------------|:---------------|:--------|:------|
|**Eye Colour** |**Sample size** |**Mean** |**SD** |
|Brown          |51              |459.53   |154.37 |
|Other          |40              |441.88   |113.76 |
---

---
# Data Visualisation & Assumption Checking

From the descriptive statistics and plots, we can observe the following:

1. The boxplots and sample means indicate that the average sleep looks similar between groups.

This can be deceiving though, and remember we are conducting a hypothesis test in order to make an inference about the population, not the just the sample.

When we carry out the `$t$`-test, we will see whether or not there is a ***statistically significant*** difference.

2. From the boxplots, the data appear to be similarly spread out, with slightly more variation in the Brown group. The SD's are also fairly similar to each other (neither one is double the other). This indicates the equal variances assumption has (probably) not been violated.

3. The sample sizes for the *Brown Eyes* and *Other* groups are `$n_{brown\, eyes} = 51$` and `$n_{other} =40$` respectively. This will be useful knowledge later when checking for normality.

---
# Levene's test for Equality of Variances

To more formally assess equality of variance between groups, we can use the statistical test ***Levene's test for equality of variances***.

.content-box-blue[
.center[
**Levene's test for Equality of Variances:**
]

`$$H_0 : \text{The groups have equal variances}$$`

`$$H_1 : \text{The groups do not have equal variances}$$`

* If *p* < 0.05, equal variances cannot be assumed
* If *p* > 0.05, equal variances can be assumed
]

Since we start out by assuming the groups have equal variances, the test tells us to only reject this assumption if we get a small `$p$`-value.

That is, a small `$p$`-value indicates the groups do not have equal variances.

---
# Levene's test for Equality of Variances

Let's carry out the Levene's test for the eye colour/sleep data:

```
Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  1  0.6788 0.4122
      89               
```

As we can see, we have `$p = 0.4122$`. Since `$p > 0.05$`, equal variances can be assumed. Given our observations from the box plots and standard deviations, this is not a surprising result.

#### What happens if the equal variances assumption is violated?

* There are two versions of the independent samples `$t$`-test: one that assumes equal variances, and one that does not
  * If equal variances can be assumed, we use the version of the `$t$`-test that assumes equal variances
  * If equal variances cannot be assumed, we use the version of the `$t$`-test that does NOT assume equal variances

We will have a chance to practise this in future computer labs.

---
# Checking for normality

Recall that we can consider histograms, Normal Q-Q plots, and the Shapiro-Wilk test to check for normality

* For the independent samples `$t$`-test, this needs to be done for ***both groups individually***

---

Based on the histogram and Normal Q-Q plots, do you have any concerns regarding normality for either group?

---
# Checking for normality

Given there is some doubt following inspection of the histogram and Normal Q-Q plots, the Shapiro-Wilk test results can provide further guidance:

**Shapiro-Wilk test for Brown eyes group:**

```

Shapiro-Wilk normality test

data:  sleep$Minutes[sleep$Eye_colour == "Brown"]
W = 0.89414, p-value = 0.0002696
```

**Shapiro-Wilk test for not Brown eyes group:**

```

Shapiro-Wilk normality test

data:  sleep$Minutes[sleep$Eye_colour == "Other"]
W = 0.9426, p-value = 0.04234
```

Given we have `$p < 0.001$` and `$p = 0.0423$` for each group respectively, it appears the normality assumption has been violated.

However...

---
# Checking for normality

* Recall that the sample size for each group is 51 and 40 respectively.

* Also recall that the underlying distribution does not have to be normally distributed to satisfy the assumption - it is the sample mean that should be normally distributed

* Given `$n > 30$` for both groups, we can therefore apply the Central Limit Theorem and conclude that the **normality assumption has been met**

We are now ready to carry out the independent samples `$t$`-test.