MAS 261 - Lecture 17

Language of Hypothesis Testing/One Sample t-tests

Author

Penelope Pooler Eisenbies

Published

October 22, 2024

Housekeeping

Comments and Questions about HW 5
A few minutes for R Questions 🪄
Review of the Concept of Testing a Hypothesis using Confidence Interval
Language of Hypothesis Testing
Goals and Framework for Testing Hypotheses
What we can say and what we can’t
One Sided vs. Two Sided Hypothesis

R and RStudio

In this course we will use R and RStudio to understand statistical concepts.
You will access R and RStudio through Posit Cloud.
- Sign up for a Free Posit Cloud Account
I will post R/RStudio files on Posit Cloud that you can access in provided links.
I will also provide demo videos that show how to access files and complete exercises.
NOTE: The free Posit Cloud account is limited to 25 hours per month.
- I demo how to download completed work so that you can use this allotment efficiently.
- For those who want to go further with R/RStudio:
  - I have added a new page to the MAS 261 website, Installing R and RStudio

Lecture 17 In-class Exercises - Q1

In Lecture 16, a survey about Electric Vehicles (EVs).

1025 people were surveyed
318 people said they were extremely, very, or somewhat likely to buy an EV.

Use the prop.test command, to estimate the 90% confidence interval for the proportion of US Adults who are likely to buy an EV soon.
What is the lower bound of this confidence interval?

Testing Hypotheses Informally

In the last lecture we talked about what a hypothesis is and how we can test it using a confidence interval.

For example, a green energy skeptic says that less than a quarter of Adults are likely to buy an EV.

We can use a 95% confidence interval to test his claim.
If our 95% confidence interval is fully above 25%, then we are 95% confident that the true proportion is in our interval and above 25%

What do we conclude based on the 95% confidence interval of these proportion data?

Formal Language of Hypothesis Testing

Testing hypotheses requires TWO Hypotheses
The two hypotheses are COMPLEMENTS and cover all possible values.
- In other words ONLY one or the other can be true
- There is NO WAY both hypotheses can be true
- There is NO WAY that neither hypothesis can be true
For example, let’s examine the heights of human male characters in the Star Wars franchise.
I hypothesize that these characters, on average, are significantly taller than human males in the United States.
- How should I test this?

Setting up a Formal Hypothesis Test

Specifying TWO Hypotheses

The NULL Hypothesis, $H_{0}$, is the default. This is what we try to DISPROVE
The ALTERNATIVE Hypothesis, $H_{A}$, is what we hope to prove.
For our Star Wars Data, we hope to prove that the average height of Star Wars male humans is significantly taller than the population mean height of males in the United States.
- $\mu_{0} = 176$: Population mean of male heights in the United States
- $\overline{X}_{SW}$ is the sample mean of heights for Star Wars Characters
  - $\overline{X}_{SW}$ is an estimate $\mu_{SW}$, the population of all Star Wars male heights
$H_{0}:$ Average heights of Star Wars males are less than or equal to the average heights of US males.
$H_{A}:$ Average heights of Star Wars males are greater than the average heights of US males.

Important Details about These Hypotheses

$H_{0}:$ Average heights of Star Wars males are less than or equal to the average height of all US males.

$H_{A}:$ Average heights of Star Wars males are greater than the average height of all US males.

Here re the same Hypotheses written in Formal Notation:

$H_{0}: \mu_{SW} \leq \mu_{0}$
$H_{A}: \mu_{SW} \gt \mu_{0}$
Notice that we specify what we are trying to prove as the ALTERNATIVE
- We CAN NEVER prove the Null hypothesis, $H_{0}$ is true.
- We assume the null hypothesis $H_{0}$ is true, and test if our data contradict this assumption.
- The null hypothesis is ALWAYS specified to include an equality: $\leq$, $\geq$, or $=$
- The alternative hypothesis is ALWAYS specified as a strict inequality: $\lt$, $\gt$, or $\neq$

Testing Our Hypothesis and Drawing a Conclusion

$H_{0}: \mu_{SW} \leq 176$

$H_{A}: \mu_{SW} \gt 176$

Code

```{r echo=T}
sw_male_heights <- read_csv("data/StarWars_Human_Male_Heights.csv", show_col_types = F)
t.test(sw_male_heights$height, mu=176, alternative = "greater")
```


    One Sample t-test

data:  sw_male_heights$height
t = 3.7177, df = 22, p-value = 0.0005989
alternative hypothesis: true mean is greater than 176
95 percent confidence interval:
 179.4159      Inf
sample estimates:
mean of x 
 182.3478

This is a ONE-SIDED test.
We reject the null hypothesis if our sample mean is signifcantly GREATER than 176.
The p-value indicates the probability of seeing these sample data if the null hypothesis is true.

Lecture 17 In-class Exercises - Q2

Interpreting the p-value from the the t.test


    One Sample t-test

data:  sw_male_heights$height
t = 3.7177, df = 22, p-value = 0.0005989
alternative hypothesis: true mean is greater than 176
95 percent confidence interval:
 179.4159      Inf
sample estimates:
mean of x 
 182.3478

If the mean height of males in the Star Wars world is 176 or less, then the chance of seeing the data we have is
- 0.0006 or 0.006%
The Confidence Interval ONLY shows the Lower Bound because this is a ONE-SIDED TEST
Based on this P-value what do we conclude?

General Guidelines for Interpreting P-values

In the Star Wars Example, the P-value is very small so the decision is clear-cut
In many cases we have to set a cutoff, also called an $\alpha$ level
- Yes this is related to $\alpha$ in Confidence Intervals - stay tuned
For hypothesis tests:
- $\alpha$ is the type 1 error rate, the probability of rejecting $H_{0}$ when it is actually true.
- We (the analyst) specify the $\alpha$ cutoff, typically but not always as 0.05

Four Possible Outcomes of a Hypothesis test

Choice of $\alpha$

Analysts most commonly set $\alpha$ at 0.05 for Hypothesis Tests, but SOMETIMES 0.01 or 0.10 are used.

Interpreting p-values sensibly

By far, the most typical $\alpha$ is 0.05.
Even with a cutoff, we should interpret the p-value along a spectrum
- A p-value of 0.049 is alsmost identical to 0.05001.
- It is wise to set an objective cutoff BEFORE we analyze the data,
  - BUT we also should put results in perspective.
In a standard situation when we use $\alpha=0.05$, I think of the evidence against the null hypothesis along this spectrum:
- 0.0 - 0.01 Extremely strong evidence against $H_{0}$
- 0.011 - 0.03 Strong evidence against $H_{0}$
- 0.031 - 0.049 Some evidence against $H_{0}$
- 0.05 - 0.07 Suggestive evidence against $H_{0}$
- 0.071 - 0.099 Minimal evidence against $H_{0}$
- 0.1 and above No evidence against $H_{0}$

Why choose a different $\alpha$ than 0.05

$\alpha$ is the probability that we falsely reject $H_{0}$ when it is TRUE.

Depending on the discipline, we might want to set a different cutoff

In some disciplines, you want to minimize the chance of making a mistake.
- For example in a road safety or drug approval study where you want to be sure of your conclusion before going public.

In other disciplines, you might be willing to take a riskier more exploratory approach to testing a hypothesis.
- For example in the initial stages of exploratory scientific research you may lower your criteria in an inital pilot study to determine if further study is warranted.

A Two-Sided Hypothesis Test Example

We have data from two Coca-cola plants.

Each plant is required to fill each can with 12 ounces of soda.
If, on average, they are under filling OR overfilling the cans, the plant will have to shut down and recalibratethe machinery.
The default is that each plant is working fine. That’s our null hypothesis, $H_{0}$.

Lecture 17 In-class Exercises - Q3

How do we state the null and alternative hypotheses we are testing based on the question we want to answer?

A. $H_{0}: \mu \leq 12$ vs. $H_{A}: \mu \gt 12$

B. $H_{0}: \mu = 12$ vs. $H_{A}: \mu \neq 12$

C. $H_{0}: \mu \geq 12$ vs. $H_{A}: \mu \lt 12$

NOTE: Translating a question into testable hypotheses is the often challenging and takes some practice.

Lecture 17 In-class Exercises - Q4

Run the t-test command for a two-tailed test with $\alpha = 0.05$ (default options) for both Plant 1 and Plant 2.

Which plant would need to shut down if we set $\alpha = 0.05$.

Code

```{r eval = F, echo=T}
coke <- read_csv("data/Coca-cola.csv", show_col_types = F)

t.test(coke$Plant_1, mu=12)

t.test(coke$Plant_2, mu=12)
```

Complete Conclusions for Plant 1

Complete Conclusions for Plant 2

t-test P-values and Confidence Intervals

How are they related?

If the same $\alpha$ is used for a hypothesis test and a confidence interval, results will agree.

P-value $< \alpha$, we reject $H_{0}$ and conclude that $\mu \neq \mu_{0}$
(1 - $\alpha$)x100% Confidence Interval does not include $\mu_{0}$

P-value $\geq \alpha$, we DON’T reject $H_{0}$ and conclude that there no evidence to contradict $\mu = \mu_{0}$
(1 - $\alpha$)x100% Confidence Interval includes $\mu_{0}$

Lecture 17 In-class Exercises - Q5

Shutting down a plant is very expensive and the owner feels these test should have been done using $\alpha = 0.01$

Would either plant have to shut down if $\alpha$ was set to 0.01 for this question?

NOTE that the P-value does not change but we change the Confidence Level to match our new $\alpha = 0.01$.

Code

```{r eval = F, echo=T}
t.test(coke$Plant_1, mu=12, conf.level = .99)

t.test(coke$Plant_2, mu=12, conf.level = .99)
```

NOTE: It is unethical to change $\alpha$ AFTER looking at the data, but we will do it here to better understand the effect of these choices.

Key Points from Today

Language of Hypothesis Testing take a little time to get used to
The challenge is to interpret the question posed and translate that into testable hypotheses.
The hypotheses is set up so that the alternative hypothesis, $H_{A}$ is what you are trying to prove.
The null hypothesis, $H_{0}$ always includes an equality
The alternative hypothesis, $H_{A}$ always includes a strict inequality
Today we used the t.test command to find our p-values and confidence intervals
Next week vdist_t_prob to visualize the p-values, review of t calculation, and two sample tests.

To submit an Engagement Question or Comment about material from Lecture 17: Submit it by midnight today (day of lecture).

--- title: "MAS 261 - Lecture 17" subtitle: "Language of Hypothesis Testing/One Sample t-tests" author: "Penelope Pooler Eisenbies" date: last-modified toc: true toc-depth: 3 toc-location: left toc-title: "Table of Contents" toc-expand: 1 format: html: code-line-numbers: true code-fold: true code-tools: true execute: echo: fenced --- ## Housekeeping ```{r setup, echo=FALSE, warning=F, message=F, include=F} #| include: false # this line specifies options for default options for all R Chunks knitr::opts_chunk$set(echo=F) # suppress scientific notation options(scipen=100) # install helper package that loads and installs other packages, if needed if (!require("pacman")) install.packages("pacman", repos = "http://lib.stat.cmu.edu/R/CRAN/") # install and load required packages pacman::p_load(pacman,tidyverse, magrittr, olsrr, shadowtext, mapproj, knitr, kableExtra, countrycode, usdata, maps, RColorBrewer, gridExtra, ggthemes, gt, mosaicData, epiDisplay, vistributions) # verify packages # p_loaded() ``` - Comments and Questions about HW 5 - A few minutes for R Questions 🪄 - Review of the Concept of Testing a Hypothesis using Confidence Interval - Language of Hypothesis Testing - Goals and Framework for Testing Hypotheses - What we can say and what we can't - One Sided vs. Two Sided Hypothesis ## R and RStudio - In this course we will use R and RStudio to understand statistical concepts. - You will access R and RStudio through **Posit Cloud**. - Sign up for a [Free Posit Cloud Account](https://posit.cloud/plans/free){target="_blank"} - I will post R/RStudio files on Posit Cloud that you can access in provided links. - I will also provide demo videos that show how to access files and complete exercises. - NOTE: The free Posit Cloud account is limited to 25 hours per month. - I demo how to download completed work so that you can use this allotment efficiently. - For those who want to go further with R/RStudio: - I have added a new page to the MAS 261 website, [Installing R and RStudio](https://penelope2040.quarto.pub/mas-261/#installing-r-and-rstudio){target="_blank"} ## ### Lecture 17 In-class Exercises - Q1 ::::: columns ::: {.column width="50%"} In Lecture 16, a survey about Electric Vehicles (EVs). - 1025 people were surveyed - 318 people said they were extremely, very, or somewhat likely to buy an EV. - Use the prop.test command, to estimate the 90% confidence interval for the proportion of US Adults who are likely to buy an EV soon. - What is the lower bound of this confidence interval? ::: ::: {.column width="50%"} ```{r} knitr::include_graphics("img/EV_10_2023.png") ``` ::: ::::: ## Testing Hypotheses Informally In the last lecture we talked about what a hypothesis is and how we can test it using a confidence interval. For example, a green energy skeptic says that less than a quarter of Adults are likely to buy an EV. - We can use a 95% confidence interval to test his claim. - If our 95% confidence interval is fully above 25%, then we are 95% confident that the true proportion is in our interval and above 25% ::: fragment What do we conclude based on the 95% confidence interval of these proportion data? ::: ## Formal Language of Hypothesis Testing - Testing hypotheses requires TWO Hypotheses - The two hypotheses are COMPLEMENTS and cover all possible values. - In other words ONLY one or the other can be true - There is NO WAY both hypotheses can be true - There is NO WAY that neither hypothesis can be true - For example, let's examine the heights of human male characters in the Star Wars franchise. - I hypothesize that these characters, on average, are significantly taller than human males in the United States. - How should I test this? ```{r eval=F} my_starwars <- starwars |> filter(species=="Human" & sex=="male") |> filter(!is.na(height)) |> dplyr::select(name, height) |> write_csv("data/StarWars_Human_Male_Heights.csv") ``` ## ### Setting up a Formal Hypothesis Test **Specifying TWO Hypotheses** - The NULL Hypothesis, $H_{0}$, is the default. This is what we try to **DISPROVE** - The ALTERNATIVE Hypothesis, $H_{A}$, is what we hope to prove. - For our Star Wars Data, we hope to prove that the average height of Star Wars male humans is significantly taller than the population mean height of males in the United States. - $\mu_{0} = 176$: Population mean of male heights in the United States - $\overline{X}_{SW}$ is the sample mean of heights for Star Wars Characters - $\overline{X}_{SW}$ is an estimate $\mu_{SW}$, the population of all Star Wars male heights - $H_{0}:$ Average heights of Star Wars males are less than or equal to the average heights of US males. - $H_{A}:$ Average heights of Star Wars males are greater than the average heights of US males. ## ### Important Details about These Hypotheses $H_{0}:$ Average heights of Star Wars males are less than or equal to the average height of all US males. $H_{A}:$ Average heights of Star Wars males are greater than the average height of all US males. Here re the same Hypotheses written in Formal Notation: - $H_{0}: \mu_{SW} \leq \mu_{0}$ - $H_{A}: \mu_{SW} \gt \mu_{0}$ - Notice that we specify what we are trying to prove as the **ALTERNATIVE** - We CAN **NEVER** prove the Null hypothesis, $H_{0}$ is true. - We assume the null hypothesis $H_{0}$ is true, and test if our data contradict this assumption. - The null hypothesis is **ALWAYS** specified to include an equality: $\leq$, $\geq$, or $=$ - The alternative hypothesis is **ALWAYS** specified as a strict inequality: $\lt$, $\gt$, or $\neq$ ## Testing Our Hypothesis and Drawing a Conclusion $H_{0}: \mu_{SW} \leq 176$ $H_{A}: \mu_{SW} \gt 176$ ```{r echo=T} sw_male_heights <- read_csv("data/StarWars_Human_Male_Heights.csv", show_col_types = F) t.test(sw_male_heights$height, mu=176, alternative = "greater") ``` - This is a ONE-SIDED test. - We reject the null hypothesis if our sample mean is signifcantly **GREATER** than 176. - The `p-value` indicates the probability of seeing these sample data if the null hypothesis is true. ## ### Lecture 17 In-class Exercises - Q2 **Interpreting the `p-value` from the the `t.test`** ```{r} t.test(sw_male_heights$height, mu=176, alternative = "greater") ``` - If the mean height of males in the Star Wars world is 176 or less, then the chance of seeing the data we have is - 0.0006 or 0.006% - The Confidence Interval ONLY shows the Lower Bound because this is a ONE-SIDED TEST - Based on this P-value what do we conclude? ## ### General Guidelines for Interpreting P-values - In the Star Wars Example, the P-value is very small so the decision is clear-cut - In many cases we have to set a cutoff, also called an $\alpha$ level - Yes this is related to $\alpha$ in Confidence Intervals - stay tuned - For hypothesis tests: - $\alpha$ is the type 1 error rate, the probability of rejecting $H_{0}$ when it is actually true. - We (the analyst) specify the $\alpha$ cutoff, typically but not always as 0.05 ```{r} knitr::include_graphics("img/p_value_interpretation.png", dpi = 300) ``` ## ### Four Possible Outcomes of a Hypothesis test ```{r} knitr::include_graphics("img/type1_type2_errors.png", dpi = 300) ``` ## Choice of $\alpha$ Analysts most commonly set $\alpha$ at 0.05 for Hypothesis Tests, but SOMETIMES 0.01 or 0.10 are used. ```{r} knitr::include_graphics("img/choice_of_alpha.png", dpi = 300) ``` ## Interpreting p-values sensibly - By far, the most typical $\alpha$ is 0.05. - Even with a cutoff, we should interpret the p-value along a spectrum - A p-value of 0.049 is alsmost identical to 0.05001. - It is wise to set an objective cutoff BEFORE we analyze the data, - BUT we also should put results in perspective. - In a standard situation when we use $\alpha=0.05$, I think of the evidence against the null hypothesis along this spectrum: - 0.0 - 0.01 Extremely strong evidence against $H_{0}$ - 0.011 - 0.03 Strong evidence against $H_{0}$ - 0.031 - 0.049 Some evidence against $H_{0}$ - 0.05 - 0.07 Suggestive evidence against $H_{0}$ - 0.071 - 0.099 Minimal evidence against $H_{0}$ - 0.1 and above No evidence against $H_{0}$ ## Why choose a different $\alpha$ than 0.05 $\alpha$ is the probability that we falsely reject $H_{0}$ when it is TRUE. - Depending on the discipline, we might want to set a different cutoff - In some disciplines, you want to minimize the chance of making a mistake. - For example in a road safety or drug approval study where you want to be sure of your conclusion before going public. - In other disciplines, you might be willing to take a riskier more exploratory approach to testing a hypothesis. - For example in the initial stages of exploratory scientific research you may lower your criteria in an inital pilot study to determine if further study is warranted. ## A Two-Sided Hypothesis Test Example ::::: columns ::: {.column width="50%"} We have data from two Coca-cola plants. - Each plant is required to fill each can with 12 ounces of soda. - If, on average, they are under filling OR overfilling the cans, the plant will have to shut down and recalibratethe machinery. - The default is that each plant is working fine. That's our null hypothesis, $H_{0}$. ::: ::: {.column width="50%"} ```{r} knitr::include_graphics("img/coke_plant_he.jpg") ``` ::: ::::: ## ### Lecture 17 In-class Exercises - Q3 ::::: columns ::: {.column width="50%"} How do we state the null and alternative hypotheses we are testing based on the question we want to answer? A. $H_{0}: \mu \leq 12$ vs. $H_{A}: \mu \gt 12$ B. $H_{0}: \mu = 12$ vs. $H_{A}: \mu \neq 12$ C. $H_{0}: \mu \geq 12$ vs. $H_{A}: \mu \lt 12$ **NOTE: Translating a question into testable hypotheses is the often challenging and takes some practice.** ::: ::: {.column width="50%"} ```{r} knitr::include_graphics("img/coke_plant_he.jpg") ``` ::: ::::: ## ### Lecture 17 In-class Exercises - Q4 Run the t-test command for a two-tailed test with $\alpha = 0.05$ (default options) for both Plant 1 and Plant 2. Which plant would need to shut down if we set $\alpha = 0.05$. ```{r eval = F, echo=T} coke <- read_csv("data/Coca-cola.csv", show_col_types = F) t.test(coke$Plant_1, mu=12) t.test(coke$Plant_2, mu=12) ``` ## Complete Conclusions for Plant 1 ```{r} knitr::include_graphics("img/cokep1_conc.png", dpi=300) ``` ## Complete Conclusions for Plant 2 ```{r} knitr::include_graphics("img/cokep2_conc.png", dpi=300) ``` ## ### t-test P-values and Confidence Intervals **How are they related?** **If the same** $\alpha$ is used for a hypothesis test and a confidence interval, results will agree. - P-value $< \alpha$, we reject $H_{0}$ and conclude that $\mu \neq \mu_{0}$ - (1 - $\alpha$)x100% Confidence Interval does not include $\mu_{0}$ - P-value $\geq \alpha$, we DON'T reject $H_{0}$ and conclude that there no evidence to contradict $\mu = \mu_{0}$ - (1 - $\alpha$)x100% Confidence Interval includes $\mu_{0}$ ## ### Lecture 17 In-class Exercises - Q5 Shutting down a plant is very expensive and the owner feels these test should have been done using $\alpha = 0.01$ Would either plant have to shut down if $\alpha$ was set to 0.01 for this question? NOTE that the P-value does not change but we change the Confidence Level to match our new $\alpha = 0.01$. ```{r eval = F, echo=T} t.test(coke$Plant_1, mu=12, conf.level = .99) t.test(coke$Plant_2, mu=12, conf.level = .99) ``` **NOTE: It is unethical to change** $\alpha$ AFTER looking at the data, but we will do it here to better understand the effect of these choices. ## {background-image="img/tired_panda_faded.png"} ### Key Points from Today - Language of Hypothesis Testing take a little time to get used to - The challenge is to interpret the question posed and translate that into testable hypotheses. - The hypotheses is set up so that the alternative hypothesis, $H_{A}$ is what you are trying to prove. - The null hypothesis, $H_{0}$ always includes an equality - The alternative hypothesis, $H_{A}$ always includes a strict inequality - Today we used the `t.test` command to find our p-values and confidence intervals - Next week `vdist_t_prob` to visualize the p-values, review of t calculation, and two sample tests. ::: fragment **To submit an Engagement Question or Comment about material from Lecture 17:** Submit it by midnight today (day of lecture). :::

MAS 261 - Lecture 17

Housekeeping

R and RStudio

Lecture 17 In-class Exercises - Q1

Testing Hypotheses Informally

Formal Language of Hypothesis Testing

Setting up a Formal Hypothesis Test

Important Details about These Hypotheses

Testing Our Hypothesis and Drawing a Conclusion

Lecture 17 In-class Exercises - Q2

General Guidelines for Interpreting P-values

Four Possible Outcomes of a Hypothesis test

Choice of \(\alpha\)

Interpreting p-values sensibly

Why choose a different \(\alpha\) than 0.05

A Two-Sided Hypothesis Test Example

Lecture 17 In-class Exercises - Q3

Lecture 17 In-class Exercises - Q4

Complete Conclusions for Plant 1

Complete Conclusions for Plant 2

t-test P-values and Confidence Intervals

Lecture 17 In-class Exercises - Q5

Key Points from Today