2023-10-31
Today’s plan 📋
Comments and Questions about Previous Lecture from Engangement Questions
Upcoming Dates
A few minutes for R Questions 🪄
Review Question - One sample Hypothesis Test
One-sided vs Two-sided Tests
Two Sample Hypothesis tests
t-test comparing Two Independent groups
Paired t-tests
Introduction of HW 6
Review: You have two options to facilitate your introduction to R and RStudio:
If you are comfortable with coding: Start with Option 1, but still sign up for Posit Cloud account.
If you are nervous about coding: Choose Option 2.
For both options: I can help with download/install issues during office hours.
What I do: I maintain a Posit Cloud account for helping students but I do most of my work on my laptop.
NOTE: We will use R and RStudio in class during MOST lectures
Mid-Semester Progress Reports will be submitted this week
Will not include attendance and in-class polling
Not everyone receives a report (no news is medium to good news).
HW 6 is now posted and is due 11/1 (Grace period ends 11/3)
This assignment seems long but it’s not.
It consists of just three hypothesis tests with questions about each test.
Most questions are multiple choice, but do not just guess and keep trying.
HW 7 will be posted next week (Lectures 18 - 20)
Test 2 is on November 14th and will include material up through Lecture 20
US Adults were asked: How do you feel about restrictions on sales of new gas-powered vehicles?
1025 people were surveyed.
215 people said they were supportive.
Can researchers honestly use this statement:
Hypotheses Tested:
Use the prop.test command, to estimate the 95% confidence interval for the proportion of US Adults who are supportive of restrictions.
correct=F
because the sample size is large.alternative="greater"
.In a standard situation when we use \(\alpha=0.05\), I think of the evidence against the null hypothesis along this spectrum:
0.0 - 0.01 Very strong evidence against \(H_{0}\)
0.011 - 0.03 Strong evidence against \(H_{0}\)
0.031 - 0.049 Some evidence against \(H_{0}\)
0.05 - 0.07 Suggestive evidence against \(H_{0}\)
0.071 - 0.099 Minimal evidence against \(H_{0}\)
0.1 and above No evidence against \(H_{0}\)
Hypothesis Tests can be structured differently BUT framework for testing is the same.
Specify hypotheses that match data and question of interest.
Specify \(\alpha\).
Conduct hypothesis test using software.
t.test
and prop.test
commands in RCompare p-value to \(\alpha\) and/or examine confidence interval or confidence bounds.
P-value \(\lt \alpha\): Reject \(H_{0}\).
P-value \(\geq \alpha\): Do not reject \(H_{0}\).
Interpret results in terms of data and question asked.
Question: Is there a significant difference between the city miles per gallon (MPG) of Italian and German high end cars?
Question as written assumes no prior knowledge.
If we don’t know which group mean might be larger, we use a two-sided test.
Null (default) hypothesis is that the two group means are equal
Alternative hypothesis is the complement, the two group means are unequal
Hypotheses:
\(H_{0}: \mu_{ITA} = \mu_{GER}\)
\(H_{A}: \mu_{ITA} \neq \mu_{GER}\)
Question: Is there a significant difference between the city fuel efficiency of Italian and German high end cars?
Conduct the t.test to answer this question
t.test
command below alternative="two.sided"
and conf.level=.95
are the defaults and not required.Question 2: What is the p-value of this hypothesis test?
Round answer to 4 decimal places.
Question 3: Based on this p-value which we compare to \(\alpha = 0.05\) do we reject or fail to reject the null hypothesis?
Question 4: Are the cars from these two countries significantly different with respect to City MPG? If so, which country’s cars has a higher average City MPG?
Question 5: What type of error might we have made?
The P-value from our original two-sided test is 0.0004092
Lets rerun the test above as a left-sided test and examine the change in the results.
New Hypotheses:
\(H_{0}: \mu_{ITA} \geq \mu_{GER}\)
\(H_{A}: \mu_{ITA} < \mu_{GER}\)
Welch Two Sample t-test
data: itly_cty and gmny_cty
t = -4.1147, df = 23.461, p-value = 0.0002046
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
-Inf -2.612611
sample estimates:
mean of x mean of y
13.400 17.875
One-sided p-value = \(\frac{1}{2}\) of two-sided p-value.
If we know we are only interested in one tail, p-value is cut in half
Some disciplines ONLY use two-sided tests so that rejecting \(H_{0}\) is more difficult.
Other disciplines ONLY use one-sided tests, because direction of the test is obvious.
It is fairly well known that city gas mileage is ALWAYS lower than highway mileage.
We can verify that this is true for these European cars using a paired t-test.
In a paired t-test, you look at the same individuals (cars, people, sites, etc) under two sets of conditions.
A common use of paired t-tests is two collect data on humans before and after some activity such as a scary movie or period of exercise.
The key criteria for using a paired t-test is that two sets of data are NOT from independent groups.
A paired t-test is a ONE SAMPLE T-TEST of the DIFFERENCES
Do these European cars get better mileage on the highway?
Specify hypotheses for a right-sided test (Hypotheses have same format as two sample t-tests):
\(H_{0}: \mu_{HWY} \leq \mu_{CTY}\)
\(H_{0}: \mu_{HWY} \gt \mu_{CTY}\)
Specify \(\alpha = 0.05\)
Conduct paired t-test of these data:
cty <- euro_cars$mpg_c # save variables to short names
hwy <- euro_cars$mpg_h
t.test(hwy, cty, alternative="greater", conf.level=.95, paired=T)
Paired t-test
data: hwy and cty
t = 16.913, df = 30, p-value < 0.00000000000000022
alternative hypothesis: true mean difference is greater than 0
95 percent confidence interval:
6.181461 Inf
sample estimates:
mean difference
6.870968
A paired t-test is ONLY appropriate when the same individuals are measures twice under different conditions.
In order to do a paired t-test, the software creates a new variable,
the difference between mpg_h
and mpg_c
for each car
Software (R) does a one sample t-test on those differences.
Below are the paired t-test and the one sample t-test on the differences
One Sample t-test
data: mpg_diff
t = 16.913, df = 30, p-value < 0.00000000000000022
alternative hypothesis: true mean is greater than 0
95 percent confidence interval:
6.181461 Inf
sample estimates:
mean of x
6.870968
Note that the paired t-test output DOES NOT give the mean for each group.
Instead, the paired t-test output shows the mean difference.
Question 6: What is the average difference in mileage between city driving and highway driving for all of these vehicles?
Round to two decimal places.
For the remainder of class we will look at HW 6.
I will also make some short videos but using R will not be the primary challenge for this assignment.
This assignment includes many multiple choice questions that require an understanding of
Protocol for conducting and interpreting hupothesis tests is same, regardless of how they are specified.
For all hypothesis tests, the hypotheses can be specified three ways:
In addition to one sample tests, there are two two types of two sample t-tests:
To submit an Engagement Question or Comment about material from Lecture 18: Submit by midnight today (day of lecture). Click on Link next to the ❓ under Lecture 18