class: middle background-image: url(data:image/png;base64,#LTU_logo.jpg) background-position: top left background-size: 30% # STM1001 [Topic 9](https://bookdown.org/a_shaker/STM1001_Topic_9/) Lecture ## Hypothesis Testing for One and Two Sample Proportions ### La Trobe University This lecture complements the [Topic 9 readings](https://bookdown.org/a_shaker/STM1001_Topic_9/) --- # Topic 9: Related Links ## Readings [Topic 9 readings](https://bookdown.org/a_shaker/STM1001_Topic_9/) ## Notation [Notation for Topic 9: Hypothesis testing for one and two sample proportions](https://bookdown.org/a_shaker/STM1001_Topic_0/notation-summary.html#topic-9-hypothesis-testing-for-one-and-two-sample-proportions) --- # Topic 9: Hypothesis Testing for One and Two Sample Proportions **Overview** <iframe src="https://bookdown.org/a_shaker/STM1001_Topic_9/" width="100%" height="400px" data-external="1"></iframe> --- # Introduction * In previous topics where we covered `\(t\)`-tests and One-way ANOVA, we were testing for differences in ***means*** -- * In today's topic, we will be testing for differences in ***proportions*** -- * For example: * *Is the proportion of left-handed people different from 10%?* -- * *Has a particular intervention led to a significant difference in the proportion of premature births?* -- * *Is the proportion of extroverts different from 50%?* --- # One-sample test of proportions * We will start by discussing the one-sample test of proportions -- * As an example, suppose it has been claimed that 70% of university students prefer Apple (iOS) over Android phones -- * To test the claim, consider the hypotheses `$$H_0 : p = 0.7 \text{ versus } H_1 : p \neq 0.7,$$` where: * `\(p\)` denotes the population proportion of university students who prefer Apple (iOS) over Android phones -- * `\(H_0\)` denotes the null hypothesis that the population proportion of university students who prefer Apple (iOS) over Android phones is equal to 0.7 (or as a percentage, 70%) -- * `\(H_1\)` denotes the alternative hypothesis that the population proportion of university students who prefer Apple (iOS) over Android phones is different from 70%. --- # One-sample test of proportions * In more general terms, suppose we have a random sample of `\(n\)` observations with an expected proportion `\(p\)` of these observations to have a certain characteristic. Also let `\(x\)` denote the number of observations in the sample that actually have that characteristic -- * Equivalently, suppose we conduct `\(n\)` independent trials each with probability of success `\(p\)` and let `\(x\)` denote the number of successes in these `\(n\)` trials. -- * Consider the hypotheses `$$H_0 : p = p_0\text{ versus } H_1 : p \neq p_0\text{ (or }p<p_0\text{ or }p>p_0),$$` where: * `\(p_0\)` denotes the population proportion under the null hypothesis. -- Then, provided `\(n\)` is not too small (this will be further discussed shortly), a commonly used statistical test is the one-sample proportion test based on the estimate to `\(p\)`, which we denote as `\(\hat{p} = x/n.\)` -- What is `\(p_0\)` in our example? --- # One-sample test of proportions * Suppose a survey was carried out where university students were asked whether they preferred Apple (iOS) or Android phones -- * Supposing that of the `\(n = 52\)` respondents, `\(x = 39\)` said they preferred Apple (iOS) phones, we then have that `$$\hat{p} = \frac{x}{n} = \frac{39}{52} = 0.75.$$` -- Note that if we know the value of `\(n\)` and `\(\hat{p}\)`, we can use this information to calculate `\(x\)` --- # One-sample test of proportions * Next, we need to check the assumptions: -- .content-box-blue[ .center[ **One-sample test of proportion condition:** ] `\(np \geq 5\)` and `\(n(1 - p) \geq 5\)`. ] -- * As we do not know the true value of `\(p\)`, we can use `\(p_0 = 0.7\)` in its place * `\(n\)` is the number of people in our sample. We therefore have: 1. `\(np = 52\times 0.7 = 36.4\)` -- 1. `\(n(1 - p) = 52\times (1 - 0.7) = 52\times (0.3) = 15.6,\)` -- Since both results are greater than 5, the condition has been met. --- # One-sample test of proportions If the condition is met, we have that $$\hat{P}\stackrel{\tiny \text{approx.}}\sim N\left(p,\frac{p(1 - p)}{n}\right), $$ meaning that we can use the Normal distribution to carry out the hypothesis test since the estimated proportion is approximately normally distributed, due to the [Central Limit Theorem](https://bookdown.org/content/88ef9b7c-5833-4a70-84f2-93470957d1f9/3-3-clt-simulated-example-with-bernoulli-distributed-population.html). (Note: some statistical software packages apply a small 'continuity correction' to the estimates that provides slightly improved confidence intervals.) --- # One-sample test of proportions output ```r 1-sample proportions test with continuity correction data: x out of n, null probability p X-squared = 0.40385, df = 1, p-value = 0.5251 alternative hypothesis: true p is not equal to 0.7 95 percent confidence interval: 0.6076741 0.8552478 sample estimates: p 0.75 ``` --- # One-sample test of proportions output ```r 1-sample proportions test with continuity correction data: x out of n, null probability p `X-squared = 0.40385`, df = 1, p-value = 0.5251 alternative hypothesis: true p is not equal to 0.7 95 percent confidence interval: 0.6076741 0.8552478 sample estimates: p 0.75 ``` * The **test statistic** is equal to 0.40385 --- # One-sample test of proportions output ```r 1-sample proportions test with continuity correction data: x out of n, null probability p `X-squared = 0.40385`, df = 1, `p-value = 0.5251` alternative hypothesis: true p is not equal to 0.7 95 percent confidence interval: 0.6076741 0.8552478 sample estimates: p 0.75 ``` * The **test statistic** is equal to 0.40385 * The ** `\(p\)`-value** is equal to 0.5251. Since this is larger than `\(\alpha = 0.05\)`, we cannot reject `\(H_0\)` --- # One-sample test of proportions output ```r 1-sample proportions test with continuity correction data: x out of n, null probability p `X-squared = 0.40385`, df = 1, `p-value = 0.5251` alternative hypothesis: true p is not equal to 0.7 `95 percent confidence interval:` `0.6076741 0.8552478` sample estimates: p 0.75 ``` * The **test statistic** is equal to 0.40385 * The ** `\(p\)`-value** is equal to 0.5251. Since this is larger than `\(\alpha = 0.05\)`, we cannot reject `\(H_0\)` * The **95% confidence interval** for the population proportion `\(p\)` is (0.6077, 0.8552), meaning that we are 95% confident that the true value of `\(p\)` lies within the interval (0.6077, 0.8552). Since `\(p_0 = 0.7\)` is included in this interval, we cannot reject `\(H_0\)` at the `\(\alpha = 0.05\)` level of significance. --- # One-sample test of proportions output ```r 1-sample proportions test with continuity correction data: x out of n, null probability p `X-squared = 0.40385`, df = 1, `p-value = 0.5251` alternative hypothesis: true p is not equal to 0.7 `95 percent confidence interval:` `0.6076741 0.8552478` `sample estimates:` `p` `0.75` ``` * The **test statistic** is equal to 0.40385 * The ** `\(p\)`-value** is equal to 0.5251. Since this is larger than `\(\alpha = 0.05\)`, we cannot reject `\(H_0\)` * The **95% confidence interval** for the population proportion `\(p\)` is (0.6077, 0.8552), meaning that we are 95% confident that the true value of `\(p\)` lies within the interval (0.6077, 0.8552). Since `\(p_0 = 0.7\)` is included in this interval, we cannot reject `\(H_0\)` at the `\(\alpha = 0.05\)` level of significance. * The **sample proportion** is `\(\hat{p} = 0.75\)`. --- name: menti class: middle background-image: url(data:image/png;base64,#menti.jpg) background-size: 115% # Kahoot ## Go to [www.kahoot.it](https://www.kahoot.it) and use ## the code provided --- # Two-sample test of proportions * We can also compare two proportions from different (independent) populations -- * For this, we can use the two-sample test of proportions -- * For example, suppose we now wish to know whether Android vs Apple (iOS) preferences depend on whether or not you have brown eyes -- * We will be comparing the proportions from two different (independent) populations: Brown eyes vs. not brown eyes -- * Consider the hypotheses: `$$H_0 : p_1 = p_2 \text{ versus } H_1 : p_1 \neq p_2,$$` where: * `\(p_1\)` denotes the population proportion of university students with brown eyes who prefer Apple (iOS) over Android * `\(p_2\)` denotes the population proportion of university students who do not have brown eyes who prefer Apple (iOS) over Android --- # Two-sample test of proportions In terms of notation for a two-sample test of proportions, assume the following: * `\(n_1\)` is the sample sizes from population (or group) 1 * `\(n_2\)` is the sample sizes from population (or group) 2 * `\(x_1\)` is the number of individuals in the sample from population (or group) 1 exhibiting the trait of interest * `\(x_2\)` is the number of individuals in the sample from population (or group) 2 exhibiting the trait of interest. -- The estimated (or sample) proportions are `$$\hat{p}_1 = \frac{x_1}{n_1} \text{ and } \hat{p}_2 = \frac{x_2}{n_2}.$$` --- # Two-sample test of proportions * Recall the 52 university students who responded regarding Apple (iOS) vs. Android preferences -- * Suppose they have also responded indicating their eye-colour as follows -- * `\(n_1 = 30\)` students responded saying they have brown eyes. Of these, `\(x_1 = 21\)` students preferred Apple (iOS) over Android -- * `\(n_2 = 22\)` students responded saying they do not have brown eyes. Of these, `\(x_2 = 18\)` students preferred Apple (iOS) over Android -- We therefore have that * `\(\hat{p}_1 = \displaystyle \frac{x_1}{n_1} = \frac{21}{30} = 0.7\)` * `\(\hat{p}_2 = \displaystyle \frac{x_2}{n_2} = \frac{18}{22} \approx 0.82\)` -- Shortly, by carrying out the two-sample test of proportions, we will see whether or not the difference in proportions between the two groups is statistically significant. --- # Two-sample test of proportions * Next, we need to check the assumptions: .content-box-blue[ .center[ **Two-sample test of proportion conditions:** ] * `\(n_1p_1 \geq 5\)` and `\(n_1(1 - p_1) \geq 5\)` * `\(n_2p_2 \geq 5\)` and `\(n_2(1 - p_2) \geq 5\)`. ] -- * Since we do not know the true population values of `\(p_1\)` and `\(p_2\)`, we will instead use `\(\hat{p}_1\)` and `\(\hat{p}_2\)`: -- * `\(n_1\hat{p}_1 = 30\times 0.7 = 21\)` -- * `\(n_1(1 - \hat{p}_1) = 30\times (1 - 0.7) = 30\times (0.3) = 9\)` -- * `\(n_2\hat{p}_2 = 22\times 0.82 \approx 18\)` -- * `\(n_2(1 - \hat{p}_2) = 22\times (1 - 0.82) = 22\times (0.18) \approx 4\)` -- Since one of our four results is less than 5, the conditions have not been met. -- In practice, we could use a different test instead, which is beyond the scope of this subject. -- For our purposes today, we will carry out the test even though the conditions have not been met. --- # Two-sample test of proportions output ```r 2-sample test for equality of proportions with continuity correction data: c(x1, x2) out of c(n1, n2) X-squared = 0.4202, df = 1, p-value = 0.5168 alternative hypothesis: two.sided 95 percent confidence interval: -0.3875008 0.1511371 sample estimates: prop 1 prop 2 0.7000000 0.8181818 ``` --- # Two-sample test of proportions output ```r 2-sample test for equality of proportions with continuity correction data: c(x1, x2) out of c(n1, n2) `X-squared = 0.4202`, df = 1, p-value = 0.5168 alternative hypothesis: two.sided 95 percent confidence interval: -0.3875008 0.1511371 sample estimates: prop 1 prop 2 0.7000000 0.8181818 ``` * The **test statistic** is equal to 0.4202 --- # Two-sample test of proportions output ```r 2-sample test for equality of proportions with continuity correction data: c(x1, x2) out of c(n1, n2) `X-squared = 0.4202`, df = 1, `p-value = 0.5168` alternative hypothesis: two.sided 95 percent confidence interval: -0.3875008 0.1511371 sample estimates: prop 1 prop 2 0.7000000 0.8181818 ``` * The **test statistic** is equal to 0.4202 * ** `\(p\)`-value** is equal to 0.5168 Since this is larger than `\(\alpha = 0.05\)`, we cannot reject `\(H_0\)` --- # Two-sample test of proportions output ```r 2-sample test for equality of proportions with continuity correction data: c(x1, x2) out of c(n1, n2) `X-squared = 0.4202`, df = 1, `p-value = 0.5168` alternative hypothesis: two.sided `95 percent confidence interval:` `-0.3875008 0.1511371` sample estimates: prop 1 prop 2 0.7000000 0.8181818 ``` * The **test statistic** is equal to 0.4202 * ** `\(p\)`-value** is equal to 0.5168 Since this is larger than `\(\alpha = 0.05\)`, we cannot reject `\(H_0\)` * The **95% confidence interval** for the difference between `\(p_1\)` and `\(p_2\)` is (-0.3875, 0.1511), meaning that we are 95% confident that the difference in proportion between groups is within the interval (-0.3875, 0.1511). Since the interval includes 0, we cannot reject `\(H_0\)` at the `\(\alpha = 0.05\)` level of significance. --- # Two-sample test of proportions output ```r 2-sample test for equality of proportions with continuity correction data: c(x1, x2) out of c(n1, n2) `X-squared = 0.4202`, df = 1, `p-value = 0.5168` alternative hypothesis: two.sided `95 percent confidence interval:` `-0.3875008 0.1511371` `sample estimates:` `prop 1 prop 2` `0.7000000 0.8181818` ``` * The **test statistic** is equal to 0.4202 * ** `\(p\)`-value** is equal to 0.5168 Since this is larger than `\(\alpha = 0.05\)`, we cannot reject `\(H_0\)` * The **95% confidence interval** for the difference between `\(p_1\)` and `\(p_2\)` is (-0.3875, 0.1511), meaning that we are 95% confident that the difference in proportion between groups is within the interval (-0.3875, 0.1511). Since the interval includes 0, we cannot reject `\(H_0\)` at the `\(\alpha = 0.05\)` level of significance. * The **sample proportions** are `\(\hat{p}_1 = 0.7\)` and `\(\hat{p}_2 = 0.8182\)`. --- background-image: url(data:image/png;base64,#computerlab.jpg) background-position: bottom background-size: 75% class: center # See you in the computer labs! --- class: middle <font color = "grey"> These notes have been prepared by Amanda Shaker. The copyright for the material in these notes resides with the authors named above, with the Department of Mathematics and Statistics and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License <a href = "https://creativecommons.org/licenses/by-nc-nd/4.0/" target="_blank"> BY-NC-ND. </a> </font>