Two Sample t-Tests and Confidence Intervals
2024-10-23
Comments and Questions about Previous Lecture from Engagement Questions
Upcoming Dates
A few minutes for R Questions 🪄
Review Question - One sample Hypothesis Test
One-sided vs Two-sided Tests
Two Sample Hypothesis tests
t-test comparing Two Independent groups
Paired t-tests
Introduction of HW 6
Mid-Semester Progress Reports.
HW 6 is now posted and is due 10/30 (Grace period ends 10/31).
This assignment seems long but it’s not.
It consists of just three hypothesis tests with questions about each test.
Most questions are multiple choice, but do not just guess and keep trying.
Demo videos will be posted by tomorrow or Saturday at the latest.
HW 7 will be posted next week (Lectures 19 - 21)
Test 2 is on November 12th and will include material up through Lecture 21 (HW 7)
Lecture 22 - Intro to Portfolio Management will be on Final Exam, not on Test 2.
In this course we will use R and RStudio to understand statistical concepts.
You will access R and RStudio through Posit Cloud.
I will post R/RStudio files on Posit Cloud that you can access in provided links.
I will also provide demo videos that show how to access files and complete exercises.
NOTE: The free Posit Cloud account is limited to 25 hours per month.
I demo how to download completed work so that you can use this allotment efficiently.
For those who want to go further with R/RStudio:
US Adults were asked: How do you feel about restrictions on sales of new gas-powered vehicles?
1025 people were surveyed and 215 people said they were supportive of restrictions.
Is this true?: Significantly more than 20% of US Adults are supportive of restrictions.
Hypotheses Tested:
Use the prop.test
command to estimate the 95% conf. interval for the proportion of US Adults who are supportive of restrictions.
Remember to include correct=F
.
This is a one-sided test, so we include the option alternative="greater"
.
In a standard situation when we use \(\alpha=0.05\), I think of the evidence against the null hypothesis along this spectrum:
0.0 - 0.01 Very strong evidence against \(H_{0}\)
0.011 - 0.03 Strong evidence against \(H_{0}\)
0.031 - 0.049 Some evidence against \(H_{0}\)
0.05 - 0.07 Suggestive evidence against \(H_{0}\)
0.071 - 0.099 Minimal evidence against \(H_{0}\)
0.1 and above No evidence against \(H_{0}\)
Hypothesis Tests can be structured differently BUT framework for testing is the same.
Specify hypotheses that match data and question of interest.
Specify \(\alpha\).
Conduct hypothesis test using software.
t.test
and prop.test
commands in RCompare p-value to \(\alpha\) and/or examine confidence interval or confidence bounds.
P-value \(\lt \alpha\): Reject \(H_{0}\).
P-value \(\geq \alpha\): Do not reject \(H_{0}\).
Interpret results in terms of data and question asked.
Question: Is there a significant difference between the city miles per gallon (MPG) of Italian and German high end cars?
Question as written assumes no prior knowledge.
If we don’t know which group mean might be larger, we use a two-sided test.
Null (default) hypothesis is that the two group means are equal
Alternative hypothesis is the complement, the two group means are unequal
Hypotheses:
\(H_{0}: \mu_{ITA} = \mu_{GER}\)
\(H_{A}: \mu_{ITA} \neq \mu_{GER}\)
Question: Is there a significant difference between the city fuel efficiency of Italian and German high end cars?
Conduct the t.test to answer this question
t.test
command below alternative="two.sided"
and conf.level=.95
are the defaults and not required.euro_cars <- read_csv("data/Euro_Cars.csv", show_col_types = F) # import data
itly_cty <- euro_cars$mpg_c[euro_cars$ctry_origin=="Italy"] # shorten variable names
gmny_cty <- euro_cars$mpg_c[euro_cars$ctry_origin=="Germany"]
t.test(itly_cty, gmny_cty, alternative="two.sided", conf.level=.95) # conduct test
Welch Two Sample t-test
data: itly_cty and gmny_cty
t = -4.1147, df = 23.461, p-value = 0.0004092
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-6.722328 -2.227672
sample estimates:
mean of x mean of y
13.400 17.875
Question 2: What is the p-value of this hypothesis test?
Round answer to 4 decimal places.
Question 3: Based on this p-value which we compare to \(\alpha = 0.05\) do we reject or fail to reject the null hypothesis?
Question 4: Are the cars from these two countries significantly different with respect to City MPG? If so, which country’s cars has a higher average City MPG?
Question 5: What type of error might we have made?
The P-value from our original two-sided test is 0.0004092
Lets rerun the test above as a left-sided test and examine the change in the results.
New Hypotheses:
\(H_{0}: \mu_{ITA} \geq \mu_{GER}\)
\(H_{A}: \mu_{ITA} < \mu_{GER}\)
Welch Two Sample t-test
data: itly_cty and gmny_cty
t = -4.1147, df = 23.461, p-value = 0.0002046
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
-Inf -2.612611
sample estimates:
mean of x mean of y
13.400 17.875
One-sided p-value = \(\frac{1}{2}\) of two-sided p-value.
If we know we are only interested in one tail, p-value is cut in half
Some disciplines ONLY use two-sided tests so that rejecting \(H_{0}\) is more difficult.
Other disciplines ONLY use one-sided tests, because direction of the test is obvious.
It is fairly well known that city gas mileage is ALWAYS lower than highway mileage.
We can verify that this is true for these European cars using a paired t-test.
In a paired t-test, you look at the same individuals (cars, people, sites, etc) under two sets of conditions.
A common use of paired t-tests is two collect data on humans before and after some activity such as a scary movie or period of exercise.
The key criteria for using a paired t-test is that two sets of data are NOT from independent groups.
A paired t-test is a ONE SAMPLE T-TEST of the DIFFERENCES
Do these European cars get better mileage on the highway?
Specify hypotheses for a right-sided test (Hypotheses have same format as two sample t-tests):
\(H_{0}: \mu_{HWY} \leq \mu_{CTY}\)
\(H_{0}: \mu_{HWY} \gt \mu_{CTY}\)
Specify \(\alpha = 0.05\)
Conduct paired t-test of these data:
cty <- euro_cars$mpg_c # save variables to short names
hwy <- euro_cars$mpg_h
t.test(hwy, cty, alternative="greater", conf.level=.95, paired=T)
Paired t-test
data: hwy and cty
t = 16.913, df = 30, p-value < 0.00000000000000022
alternative hypothesis: true mean difference is greater than 0
95 percent confidence interval:
6.181461 Inf
sample estimates:
mean difference
6.870968
A paired t-test is ONLY appropriate when the same individuals are measures twice under different conditions.
In order to do a paired t-test, the software creates a new variable,
the difference between mpg_h
and mpg_c
for each car
Software (R) does a one sample t-test on those differences.
Below are the paired t-test and the one sample t-test on the differences
One Sample t-test
data: mpg_diff
t = 16.913, df = 30, p-value < 0.00000000000000022
alternative hypothesis: true mean is greater than 0
95 percent confidence interval:
6.181461 Inf
sample estimates:
mean of x
6.870968
Note that the paired t-test output DOES NOT give the mean for each group.
Instead, the paired t-test output shows the mean difference.
Question 6: What is the average difference in mileage between city driving and highway driving for all of these vehicles?
Round to two decimal places.
For the remainder of class we will look at HW 6.
I will also make some short videos but using R will not be the primary challenge for this assignment.
This assignment includes many multiple choice questions that require an understanding of
Protocol for conducting and interpreting hupothesis tests is same, regardless of how they are specified.
For all hypothesis tests, the hypotheses can be specified three ways:
In addition to one sample tests, there are two two types of two sample t-tests:
To submit an Engagement Question or Comment about material from Lecture 18: Submit it by midnight today (day of lecture).