STM1001 Topic 9 Lecture

class: middle
background-image: url(data:image/png;base64,#LTU_logo.jpg)
background-position: top left
background-size: 30%

# STM1001 [Topic 9](https://bookdown.org/a_shaker/STM1001_Topic_9/) Lecture
## Hypothesis Testing for One and Two Sample Proportions
### La Trobe University
This lecture complements the [Topic 9 readings](https://bookdown.org/a_shaker/STM1001_Topic_9/)

---

# Topic 9: Related Links

## Readings

[Topic 9 readings](https://bookdown.org/a_shaker/STM1001_Topic_9/)

## Notation

[Notation for Topic 9: Hypothesis testing for one and two sample proportions](https://bookdown.org/a_shaker/STM1001_Topic_0/notation-summary.html#topic-9-hypothesis-testing-for-one-and-two-sample-proportions)

---

# Topic 9: Hypothesis Testing for One and Two Sample Proportions

**Overview**

---

# Introduction

* In previous topics where we covered `$t$`-tests and One-way ANOVA, we were testing for differences in ***means***

* In today's topic, we will be testing for differences in ***proportions***

* For example:

* *Is the proportion of left-handed people different from 10%?* 
    
--
    
    * *Has a particular intervention led to a significant difference in the proportion of premature births?* 
    
--

* *Is the proportion of extroverts different from 50%?*

---
# One-sample test of proportions

* We will start by discussing the one-sample test of proportions

* As an example, suppose it has been claimed that 70% of university students prefer Apple (iOS) over Android phones

* To test the claim, consider the hypotheses

`$$H_0 : p = 0.7 \text{ versus } H_1 : p \neq 0.7,$$`

where:

* `$p$` denotes the population proportion of university students who prefer Apple (iOS) over Android phones

* `$H_0$` denotes the null hypothesis that the population proportion of university students who prefer Apple (iOS) over Android phones is equal to 0.7 (or as a percentage, 70%)

* `$H_1$` denotes the alternative hypothesis that the population proportion of university students who prefer Apple (iOS) over Android phones is different from 70%.

---
# One-sample test of proportions

* In more general terms, suppose we have a random sample of `$n$` observations with an expected proportion `$p$` of these observations to have a certain characteristic. Also let `$x$` denote the number of observations in the sample that actually have that characteristic

* Equivalently, suppose we conduct `$n$` independent trials each with probability of success `$p$` and let `$x$` denote the number of successes in these `$n$` trials.

* Consider the hypotheses

`$$H_0 : p = p_0\text{ versus } H_1 : p \neq p_0\text{ (or }p<p_0\text{ or }p>p_0),$$`

where:

* `$p_0$` denotes the population proportion under the null hypothesis.

Then, provided `$n$` is not too small (this will be further discussed shortly), a commonly used statistical test is the one-sample proportion test based on the estimate to `$p$`, which we denote as `$\hat{p} = x/n.$`

What is `$p_0$` in our example?

---
# One-sample test of proportions

* Suppose a survey was carried out where university students were asked whether they preferred Apple (iOS) or Android phones

* Supposing that of the `$n = 52$` respondents, `$x = 39$` said they preferred Apple (iOS) phones, we then have that

`$$\hat{p} = \frac{x}{n} = \frac{39}{52} = 0.75.$$`
--

Note that if we know the value of `$n$` and `$\hat{p}$`, we can use this information to calculate `$x$`

---
# One-sample test of proportions

* Next, we need to check the assumptions:

.content-box-blue[
.center[
**One-sample test of proportion condition:**
]
`$np \geq 5$` and `$n(1 - p) \geq 5$`.
]

* As we do not know the true value of `$p$`, we can use `$p_0 = 0.7$` in its place

* `$n$` is the number of people in our sample. We therefore have:

1.  `$np = 52\times 0.7 = 36.4$`

1. `$n(1 - p) = 52\times (1 - 0.7) = 52\times (0.3) = 15.6,$`

Since both results are greater than 5, the condition has been met.

---
# One-sample test of proportions

If the condition is met, we have that

$$\hat{P}\stackrel{\tiny \text{approx.}}\sim N\left(p,\frac{p(1 - p)}{n}\right), $$

meaning that we can use the Normal distribution to carry out the hypothesis test since the estimated proportion is approximately normally distributed, due to the [Central Limit Theorem](https://bookdown.org/content/88ef9b7c-5833-4a70-84f2-93470957d1f9/3-3-clt-simulated-example-with-bernoulli-distributed-population.html). (Note: some statistical software packages apply a small 'continuity correction' to the estimates that provides slightly improved confidence intervals.)

---
# One-sample test of proportions output

```r
	1-sample proportions test with continuity correction

data:  x out of n, null probability p
X-squared = 0.40385, df = 1, p-value = 0.5251
alternative hypothesis: true p is not equal to 0.7
95 percent confidence interval:
 0.6076741 0.8552478
sample estimates:
   p 
0.75
```

---
# One-sample test of proportions output

```r
	1-sample proportions test with continuity correction

data:  x out of n, null probability p
 `X-squared = 0.40385`, df = 1, p-value = 0.5251
alternative hypothesis: true p is not equal to 0.7
95 percent confidence interval:
 0.6076741 0.8552478
sample estimates:
   p 
0.75
```

* The **test statistic** is equal to 0.40385

---
# One-sample test of proportions output

```r
	1-sample proportions test with continuity correction

data:  x out of n, null probability p
 `X-squared = 0.40385`, df = 1, `p-value = 0.5251`
alternative hypothesis: true p is not equal to 0.7
95 percent confidence interval:
 0.6076741 0.8552478
sample estimates:
   p 
0.75
```

* The **test statistic** is equal to 0.40385
* The ** `$p$`-value** is equal to 0.5251. Since this is larger than `$\alpha = 0.05$`, we cannot reject `$H_0$`

---
# One-sample test of proportions output

```r
	1-sample proportions test with continuity correction

data:  x out of n, null probability p
 `X-squared = 0.40385`, df = 1, `p-value = 0.5251`
alternative hypothesis: true p is not equal to 0.7
 `95 percent confidence interval:`
 `0.6076741 0.8552478`
sample estimates:
   p 
0.75
```

* The **test statistic** is equal to 0.40385
* The ** `$p$`-value** is equal to 0.5251. Since this is larger than `$\alpha = 0.05$`, we cannot reject `$H_0$`
* The **95% confidence interval** for the population proportion `$p$` is (0.6077, 0.8552), meaning that we are 95% confident that the true value of `$p$` lies within the interval (0.6077, 0.8552). Since `$p_0 = 0.7$` is included in this interval, we cannot reject `$H_0$` at the `$\alpha = 0.05$` level of significance.

---
# One-sample test of proportions output

```r
	1-sample proportions test with continuity correction

---

name: menti
class: middle
background-image: url(data:image/png;base64,#menti.jpg)
background-size: 115%

# Kahoot

## Go to [www.kahoot.it](https://www.kahoot.it) and use

## the code provided

---
# Two-sample test of proportions

* We can also compare two proportions from different (independent) populations

* For this, we can use the two-sample test of proportions

* For example, suppose we now wish to know whether Android vs Apple (iOS) preferences depend on whether or not you have brown eyes

* We will be comparing the proportions from two different (independent) populations: Brown eyes vs. not brown eyes

* Consider the hypotheses:

`$$H_0 : p_1 = p_2 \text{ versus } H_1 : p_1 \neq p_2,$$`

where:

* `$p_1$` denotes the population proportion of university students with brown eyes who prefer Apple (iOS) over Android
* `$p_2$` denotes the population proportion of university students who do not have brown eyes who prefer Apple (iOS) over Android

---
# Two-sample test of proportions

In terms of notation for a two-sample test of proportions, assume the following:

* `$n_1$` is the sample sizes from population (or group) 1
* `$n_2$` is the sample sizes from population (or group) 2
* `$x_1$` is the number of individuals in the sample from population (or group) 1 exhibiting the trait of interest 
* `$x_2$` is the number of individuals in the sample from population (or group) 2 exhibiting the trait of interest.

The estimated (or sample) proportions are

`$$\hat{p}_1 = \frac{x_1}{n_1} \text{ and } \hat{p}_2 = \frac{x_2}{n_2}.$$`
---
# Two-sample test of proportions

* Recall the 52 university students who responded regarding Apple (iOS) vs. Android preferences

* Suppose they have also responded indicating their eye-colour as follows

* `$n_1 = 30$` students responded saying they have brown eyes. Of these, `$x_1 = 21$` students preferred Apple (iOS) over Android

* `$n_2 = 22$` students responded saying they do not have brown eyes. Of these, `$x_2 = 18$`  students preferred Apple (iOS) over Android

We therefore have that

* `$\hat{p}_1 = \displaystyle \frac{x_1}{n_1} = \frac{21}{30} = 0.7$`
* `$\hat{p}_2 = \displaystyle \frac{x_2}{n_2} = \frac{18}{22} \approx 0.82$`

Shortly, by carrying out the two-sample test of proportions, we will see whether or not the difference in proportions between the two groups is statistically significant.

---
# Two-sample test of proportions

* Next, we need to check the assumptions:

.content-box-blue[
.center[
**Two-sample test of proportion conditions:**
]
* `$n_1p_1 \geq 5$` and `$n_1(1 - p_1) \geq 5$`
* `$n_2p_2 \geq 5$` and `$n_2(1 - p_2) \geq 5$`.
]

* Since we do not know the true population values of `$p_1$` and `$p_2$`, we will instead use `$\hat{p}_1$` and `$\hat{p}_2$`:
--

* `$n_1\hat{p}_1 = 30\times 0.7 = 21$` 
--

* `$n_1(1 - \hat{p}_1) = 30\times (1 - 0.7) = 30\times (0.3) = 9$` 
--

* `$n_2\hat{p}_2 = 22\times 0.82 \approx 18$` 
--

* `$n_2(1 - \hat{p}_2) = 22\times (1 - 0.82) = 22\times (0.18) \approx 4$`

Since one of our four results is less than 5, the conditions have not been met.
--
 In practice, we could use a different test instead, which is beyond the scope of this subject.
--
 For our purposes today, we will carry out the test even though the conditions have not been met.