---
title: "MAS 261 - Lecture 19"
subtitle: "Contingency Tables and Two Sample Tests of Proportions"
author: "Penelope Pooler Eisenbies"
date: last-modified
lightbox: true
toc: true
toc-depth: 3
toc-location: left
toc-title: "Table of Contents"
toc-expand: 1
format:
html:
code-line-numbers: true
code-fold: true
code-tools: true
execute:
echo: fenced
---
## Housekeeping
```{r setup, echo=FALSE, warning=F, message=F, include=F}
#| include: false
# this line specifies options for default options for all R Chunks
knitr::opts_chunk$set(echo=F)
# suppress scientific notation
options(scipen=100)
# install helper package that loads and installs other packages, if needed
if (!require("pacman")) install.packages("pacman", repos = "http://lib.stat.cmu.edu/R/CRAN/")
# install and load required packages
pacman::p_load(pacman,tidyverse, magrittr, olsrr, shadowtext, mapproj, knitr, kableExtra,
countrycode, usdata, maps, RColorBrewer, gridExtra, ggthemes, gt,
mosaicData, epiDisplay, vistributions, DiagrammeR, DiagrammeRsvg,
rsvg)
# verify packages
# p_loaded()
```
- Today's plan
- Comments and Questions about Previous Lecture from Engagement Questions
- Upcoming Dates
- A few minutes for R Questions 🪄
- Review Question - Two-sided test
- Review of One Sample Proportion Hypothesis Tests
- Contingency Tables
- Tests of Two proportions
- Format of Hypothesis Tests
## Upcoming Dates
- HW 7 is now posted and is due 11/5 (Grace period ends 11/6).
- Most questions are multiple choice, but do not just guess and keep trying.
- Test 2 is on November 11th and will include material up through Lecture 20 (HW 7)
- Lecture 21 - Intro to Portfolio Management will be on Final Exam, not on Test 2.
## R and RStudio
- In this course we will use R and RStudio to understand statistical concepts.
- You will access R and RStudio through **Posit Cloud**.
- Sign up for a [Free Posit Cloud Account](https://posit.cloud/plans/free){target="_blank"}
- I will post R/RStudio files on Posit Cloud that you can access in provided links.
- I will also provide demo videos that show how to access files and complete exercises.
- NOTE: The free Posit Cloud account is limited to 25 hours per month.
- For those who want to go further with R/RStudio:
- If you are interested in downloading R and RStudio to your own computer, I can guide you through the process.
- The software is completely free but it does have to be updated a couple times each year.
##
```{r eval=F, include=F}
# requires packages DiagrammeR, DiagrammeRsvg, rsvg
t_flow <- grViz("
digraph t_test_decision {
graph [layout = dot, rankdir = TB]
node [shape = ellipse, style = filled, fillcolor = lightgray]
Start [label = 'Start']
OneGroup [label = 'Is there only one group of data?']
OneSample [label = 'One-sample t-test', shape = box, fillcolor = lightblue]
TwoGroups [label = 'Are the two groups independent?']
TwoSample [label = 'Two-sample t-test', shape = box, fillcolor = lightblue]
Paired [label = 'Paired t-test', shape = box, fillcolor = lightblue]
OneTailQ1 [label = 'Is the question asking\nif the mean is greater or less\nthan a value?', shape = diamond]
OneTailQ2 [label = 'Is the question asking\nif one mean is greater or less\nthan the other?', shape = diamond]
OneTailQ3 [label = 'Is the question asking\nif the mean difference is\ngreater or less than 0?', shape = diamond]
OneTail1 [label = 'One-tailed test', shape = box, fillcolor = lightblue]
OneTail2 [label = 'One-tailed test', shape = box, fillcolor = lightblue]
OneTail3 [label = 'One-tailed test', shape = box, fillcolor = lightblue]
TwoTail1 [label = 'Two-tailed test', shape = box, fillcolor = lightblue]
TwoTail2 [label = 'Two-tailed test', shape = box, fillcolor = lightblue]
TwoTail3 [label = 'Two-tailed test', shape = box, fillcolor = lightblue]
Greater1 [label = 'Is it greater than?', shape = diamond]
Greater2 [label = 'Is it greater than?', shape = diamond]
Greater3 [label = 'Is it greater than?', shape = diamond]
Right1 [label = 'Right-tailed', shape = box, fillcolor = lightblue]
Left1 [label = 'Left-tailed', shape = box, fillcolor = lightblue]
Right2 [label = 'Right-tailed', shape = box, fillcolor = lightblue]
Left2 [label = 'Left-tailed', shape = box, fillcolor = lightblue]
Right3 [label = 'Right-tailed', shape = box, fillcolor = lightblue]
Left3 [label = 'Left-tailed', shape = box, fillcolor = lightblue]
// Connections
Start -> OneGroup
OneGroup -> OneSample [label = 'Yes']
OneGroup -> TwoGroups [label = 'No']
TwoGroups -> TwoSample [label = 'Yes']
TwoGroups -> Paired [label = 'No']
OneSample -> OneTailQ1
OneTailQ1 -> OneTail1 [label = 'Yes']
OneTailQ1 -> TwoTail1 [label = 'No']
OneTail1 -> Greater1
Greater1 -> Right1 [label = 'Yes']
Greater1 -> Left1 [label = 'No']
TwoSample -> OneTailQ2
OneTailQ2 -> OneTail2 [label = 'Yes']
OneTailQ2 -> TwoTail2 [label = 'No']
OneTail2 -> Greater2
Greater2 -> Right2 [label = 'Yes']
Greater2 -> Left2 [label = 'No']
Paired -> OneTailQ3
OneTailQ3 -> OneTail3 [label = 'Yes']
OneTailQ3 -> TwoTail3 [label = 'No']
OneTail3 -> Greater3
Greater3 -> Right3 [label = 'Yes']
Greater3 -> Left3 [label = 'No']
}
")
# Convert to SVG
svg <- export_svg(t_flow)
# Save as PNG
rsvg_png(charToRaw(svg), file = "img/t_test_flowchart.png")
```
:::: {.columns}
::: {.column width="25%"}
### t-test Flow Chart
- Requested by a Student
- Created in R using code generated by Copilot.
:::
::: {.column width="2%"}
:::
::: {.column width="73%"}

:::
::::
##
### Lecture 19 In-class Exercises - Q1
[***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685**
::::: columns
::: {.column width="50%"}
TRUE OR FALSE: When conducting a two-tailed two sample hypothesis test of means, we can only tell if two population means are significantly different, not which one is larger (or smaller).
- **Two-sided Two Sample Hypothesis Test**
- $H_{0}: \mu_{1} = \mu_{2}$
- $H_{A}: \mu_{1} \neq \mu_{2}$
:::
::: {.column width="50%"}
```{r}
knitr::include_graphics("img/hypotheses_graphic.jpg", dpi=300)
```
:::
:::::
##
### One Sample Hypothesis Tests of Proportions
::::: columns
::: {.column width="50%"}
Question of Interest polled by [YouGov](https://docs.cdn.yougov.com/etwjvohrxx/Daylight_Saving_Time_Toplines_Crosstabs.pdf):
987 adults in the US were asked:
**Would you like to see the changing of the clocks eliminated, so people no longer change their clocks twice per year?**
:::
::: {.column width="50%"}
```{r}
knitr::include_graphics("img/dst_fall_back.jpg")
```
:::
:::::
##
### Should We End Daylight Savings Time?
YouGov Polled 987 US adults
- 612 said YES, we should eliminate the practice of changing our clocks.
- 375 said NO or they were unsure. We group these two categories together.
::: fragment
If we test these data, what are the null and alternative hpotheses:
:::
- $H_{0}: P_{YES} = P_{NO}$ - No difference in proportion with these two opinions.
- $H_{A}: P_{YES} \neq P_{NO}$ - There is a difference in proportion with these two opinions.
::: fragment
Specify alpha ($\alpha$) as 0.05 unless we have a specific reason to choose a different alpha.
:::
##
### Should We End Daylight Savings Clock Changes?
::::: columns
::: {.column width="50%"}
```{r echo=T}
prop.test(612,987,correct=F)
```
:::
::: {.column width="50%"}
```{r}
Opinion <- c("Yes", "No/Not Sure")
Frequency <- c(612,375)
dl <- tibble(Opinion, Frequency)
(opinion_plot <- dl |>
ggplot() +
geom_bar(aes(x=Opinion, y=Frequency, fill=Opinion),
stat="identity", show.legend = F) +
scale_fill_manual(values=c("cornflowerblue","chartreuse3"))+
theme(legend.position = "none") +
theme_classic() +
labs(title="Should We Eliminate Daylight Savings Clock Changes") +
theme(plot.title = element_text(size = 20),
axis.title = element_text(size=18),
axis.text = element_text(size=15),
plot.caption = element_text(size = 10),
legend.text = element_text(size = 12),
legend.title = element_text(size = 15)))
```
:::
:::::
## One Sample Hypothesis Test Conclusion
Hypotheses being tested:
- $H_{0}: P_{YES} = 0.5$
- The Yes and Not Yes votes are roughly equal.
- $H_{A}: P_{YES} \neq 0.5$
- There is a difference in the proportion Yes and Not Yes votes.
- P-value from hypotheses test: \< 0.0001
- Conclusion: P-value is much less than 0.05 so we **REJECT** $H_{0}$.
- Interpretation: See Polling Question on next slide
##
### Lecture 19 In-class Exercises - Q2
[***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685**
Given our stated hypotheses and our p-value \< 0.0001
- $H_{0}: P_{YES} = 0.5$
- $H_{A}: P_{YES} \neq 0.5$
::: fragment
How do we interpret the outcome of this hypothesis test?
:::
##
### Contingency Tables to Examine Proportions in Multiple Categories
:::::: columns
::: {.column width="54%"}
Question: Are these disparities in opinions about daylight savings consistent among age groups?
- We can examine this question using tables, plots, and hypothesis tests.
- A Contingency Table is 2 x 2 or larger and allows us to subdivide count data by categories
- Commonly used in market research to understand opinions by category
- Example: How do Gen Z (18-29) and Millennial adults feel (30-44) about daylight savings?
:::
::: {.column width="2%"}
:::
::: {.column width="44%"}
```{r}
# Summarized counts
dl_poll <- matrix(c(99,98,129,105), ncol=2, byrow = T)
# label columns (col) and rows
colnames(dl_poll) <- c("Yes", "No/Not Sure")
rownames(dl_poll) <- c("Ages 18-29","Ages 30-44")
#create a table of these data in R
dl_table <- as.table(dl_poll)
#print version of table
kable(dl_table, align="lcc", caption="Should the USA Eliminate Daylight Savings Clock Changes")
```
:::
::::::
## Comparing Contingency Tables and Plots
Contingency tables and bar plots are two effective ways to examine these data
::::: columns
::: {.column width="50%"}
```{r}
# Summarized counts
dl_poll <- matrix(c(99,98,129,105), ncol=2, byrow = T)
# label columns (col) and rows
colnames(dl_poll) <- c("Yes", "No/Not Sure")
rownames(dl_poll) <- c("Ages 18-29","Ages 30-44")
#create a table of these data in R
dl_table <- as.table(dl_poll)
#print version of table
kable(dl_table, align="lcc", caption="Should the USA Eliminate Daylight Savings Clock Changes")
```
:::
::: {.column width="50%"}
```{r}
Frequency <- c(99,98,129,105)
Age_Group <- c(rep("Ages 18-29",2), rep("Ages 30-44",2))
Opinion <- rep(c("Yes", "No/Not Sure"),2)
dl_data2<- tibble(Age_Group,Opinion, Frequency)
(op_plot2 <- dl_data2 |>
ggplot() +
geom_bar(aes(x=Age_Group, y=Frequency, fill=Opinion),
stat="identity", position="dodge") +
scale_fill_manual(values=c("cornflowerblue","chartreuse3")) +
theme_classic() +
labs(title="Should We Eliminate Daylight Savings Clock Changes", x="") +
theme(plot.title = element_text(size = 20),
axis.title = element_text(size=18),
axis.text = element_text(size=15),
plot.caption = element_text(size = 10),
legend.text = element_text(size = 12),
legend.title = element_text(size = 15)))
```
:::
:::::
##
### Hypothesis Test Comparing Two Proportions
- Hypotheses being tested (**Usually a two-sided tests**):
- $H_{0}: P_{18-29} = P_{30-44}$
- There is no difference between these two age groups with respect to proportion that says yes.
- $H_{A}: P_{18-29} \neq P_{30-44}$
- There is a difference between these two age groups with respect to proportion that says yes.
::: fragment
```{r echo=T}
x <- c(99, 129) # yes votes in each age group (18-29 first)
n <- c(197, 234) # sample size in each age group (18-29 first)
prop.test(x,n, correct=F)
```
:::
##
### Two Proportion Hypothesis Test Conclusion and Interpretation
- Hypotheses being tested:
- $H_{0}: P_{18-29} = P_{30-44}$
- No difference between these age groups respect to proportion that says yes.
- $H_{A}: P_{18-29} \neq P_{30-44}$
- There is a difference between these age groups respect to proportion that says yes.
::: fragment
Questions we will answer:
- What is the p-value from this test?
- Do we Reject or Fail to Reject the Null Hypothesis?
- What do we conclude about the opinions of these two age groups?
:::
##
### Lecture 19 In-class Exercises - Q3-Q4
[***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685**
<br>
::: fragment
**Question 3:**
:::
- What is the p-value from this hypothesis test?
<br>
::: fragment
**Question 4:**
:::
- If we specify $\alpha = 0.05$, do we reject or fail to reject the null hypothesis, $H_{0}$?
##
### Lecture 19 In-class Exercises - Q5-Q6
[***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685**
<br>
::: fragment
**Question 5:**
:::
- What do we conclude about the opinions of these two age groups?
<br>
::: fragment
**Question 6:**
:::
- (Not on Poll Everywhere): What type of error might we have made?
##
### Do Gen-Zs and Millenials differ from Gen-Xers?
::::: columns
::: {.column width="50%"}
```{r}
# Summarized counts
dl_poll3 <- matrix(c(228,205,201,118), ncol=2, byrow = T)
# label columns (col) and rows
colnames(dl_poll3) <- c("Yes", "No/Not Sure")
rownames(dl_poll3) <- c("Ages 18-44","Ages 45-64")
#create a table of these data in R
dl_table3 <- as.table(dl_poll3)
#print version of table
kable(dl_table3, align="lcc", caption="Should the USA Eliminate Daylight Savings Clock Changes")
```
:::
::: {.column width="50%"}
```{r}
Frequency <- c(228,205,201,118)
Age_Group <- c(rep("Ages 18-44",2), rep("Ages 45-64",2))
Opinion <- rep(c("Yes", "No/Not Sure"),2)
dl_data2<- tibble(Age_Group,Opinion, Frequency)
(op_plot2 <- dl_data2 |>
ggplot() +
geom_bar(aes(x=Age_Group, y=Frequency, fill=Opinion),
stat="identity", position="dodge") +
scale_fill_manual(values=c("cornflowerblue","chartreuse3")) +
theme_classic() +
labs(title="Should We Eliminate Daylight Savings Clock Changes", x="")+
theme(plot.title = element_text(size = 20),
axis.title = element_text(size=18),
axis.text = element_text(size=15),
plot.caption = element_text(size = 10),
legend.text = element_text(size = 12),
legend.title = element_text(size = 15)))
```
:::
:::::
## Column and Row Percentages
::::: columns
::: {.column width="50%"}
**Original Data**
```{r}
# Summarized counts
dl_poll3 <- matrix(c(228,205,201,118), ncol=2, byrow = T)
# label columns (col) and rows
colnames(dl_poll3) <- c("Yes", "No/Not Sure")
rownames(dl_poll3) <- c("Ages 18-44","Ages 45-64")
#create a table of these data in R
dl_table3 <- as.table(dl_poll3)
#print version of table
kable(dl_table3, align="lcc", caption="Should the USA Eliminate Daylight Savings Clock Changes")
```
:::
::: {.column width="50%"}
**Row Percentages: Percentages of each age group that said 'Yes' or 'No'.**
```{r}
#print version of row percentages table
kable(prop.table(dl_table3, 1)*100, digits=2, align="lcc", caption = "Row Percentages")
```
**Column percentages: Percentages of Yes/No opinions in each age group.**
```{r}
#print version of column percentages table
kable(prop.table(dl_table3, 2)*100, digits=2, align="lcc", caption = "Column Percentages")
```
:::
:::::
##
### Hypothesis Test Comparing Two Proportions
- Hypotheses being tested:
- $H_{0}: P_{18-44} = P_{45-64}$
- No difference between these age groups respect to proportion that says yes.
- $H_{A}: P_{18-44} \neq P_{45-64}$
- There is a difference between these age groups respect to proportion that says yes.
::: fragment
```{r echo=T}
x <- c(228,201) # yes votes in each age group (18-44 first)
n <- c(433,319) # sample size in each age group (18-44 first)
prop.test(x,n, correct=F)
```
:::
##
### Lecture 19 In-class Exercises - Q7
[***Poll Everywhere***](https://pollev.com/penelopepoolereisenbies685){target="_blank"} - My User Name: **penelopepoolereisenbies685**
<br>
::: fragment
**Question 7:**
:::
- What do you conclude from this two sample two-sided hypothesis test of two proportions?
##
### Key Points from Today
- Protocol for conducting and interpreting hypothesis tests is same, regardless of how they are specified.
- This is true for quantitative data and for categorical proportion data
- For two sample tests of proportions, it is helpful to examine the data using contingency tables.
- By default, it is common for two sample tests of proportions to be conducted as two sided tests.
- These same methods can be used with larger contingency tables tat are interatively analyzed.
::: fragment
**To submit an Engagement Question or Comment about material from Lecture 19:** Submit it by midnight today (day of lecture).
:::