Housekeeping

  • Comments and Questions about Previous Lecture from Engagement Questions

  • Upcoming Dates

  • A few minutes for R Questions 🪄

  • Review Question - One sample Hypothesis Test

  • One-sided vs Two-sided Tests

    • Two Sample Hypothesis tests

      • t-test comparing Two Independent groups

      • Paired t-tests

  • Introduction of HW 6

Upcoming Dates

  • Mid-Semester Progress Reports.

    • Will not include attendance and in-class polling
  • HW 6 is now posted and is due 10/30 (Grace period ends 10/31).

    • This assignment seems long but it’s not.

    • It consists of just three hypothesis tests with questions about each test.

    • Most questions are multiple choice, but do not just guess and keep trying.

    • Demo videos will be posted by tomorrow or Saturday at the latest.

  • HW 7 will be posted next week (Lectures 19 - 21)

  • Test 2 is on November 12th and will include material up through Lecture 21 (HW 7)

  • Lecture 22 - Intro to Portfolio Management will be on Final Exam, not on Test 2.

R and RStudio

  • In this course we will use R and RStudio to understand statistical concepts.

  • You will access R and RStudio through Posit Cloud.

  • I will post R/RStudio files on Posit Cloud that you can access in provided links.

  • I will also provide demo videos that show how to access files and complete exercises.

  • NOTE: The free Posit Cloud account is limited to 25 hours per month.

    • I demo how to download completed work so that you can use this allotment efficiently.

    • For those who want to go further with R/RStudio:

💥 Lecture 18 In-class Exercises - Q1 💥

US Adults were asked: How do you feel about restrictions on sales of new gas-powered vehicles?

  • 1025 people were surveyed and 215 people said they were supportive of restrictions.

  • Is this true?: Significantly more than 20% of US Adults are supportive of restrictions.

  • Hypotheses Tested:

    • \(H_{0}: P \leq 0.2\)
    • \(H_{0}: P \gt 0.2\)
  • Use the prop.test command to estimate the 95% conf. interval for the proportion of US Adults who are supportive of restrictions.

  • Remember to include correct=F.

  • This is a one-sided test, so we include the option alternative="greater".

Review of Types of One-Sided Hypothesis Tests

Reminder of How to Interpret P-values

  • In a standard situation when we use \(\alpha=0.05\), I think of the evidence against the null hypothesis along this spectrum:

    • 0.0 - 0.01 Very strong evidence against \(H_{0}\)

    • 0.011 - 0.03 Strong evidence against \(H_{0}\)

    • 0.031 - 0.049 Some evidence against \(H_{0}\)

    • 0.05 - 0.07 Suggestive evidence against \(H_{0}\)

    • 0.071 - 0.099 Minimal evidence against \(H_{0}\)

    • 0.1 and above No evidence against \(H_{0}\)

Hypothesis Test Framework for all Hypothesis Tests

  • Hypothesis Tests can be structured differently BUT framework for testing is the same.

    1. Specify hypotheses that match data and question of interest.

    2. Specify \(\alpha\).

    3. Conduct hypothesis test using software.

      • e.g. t.test and prop.test commands in R
    4. Compare p-value to \(\alpha\) and/or examine confidence interval or confidence bounds.

      • P-value \(\lt \alpha\): Reject \(H_{0}\).

      • P-value \(\geq \alpha\): Do not reject \(H_{0}\).

    5. Interpret results in terms of data and question asked.

      • What do we conclude and what type of error might we have made?

Two sample Hypothesis Test Example

Question: Is there a significant difference between the city miles per gallon (MPG) of Italian and German high end cars?

  1. Setting up hypotheses
  • Question as written assumes no prior knowledge.

  • If we don’t know which group mean might be larger, we use a two-sided test.

  • Null (default) hypothesis is that the two group means are equal

  • Alternative hypothesis is the complement, the two group means are unequal

  • Hypotheses:

    • \(H_{0}: \mu_{ITA} = \mu_{GER}\)

    • \(H_{A}: \mu_{ITA} \neq \mu_{GER}\)

  1. Specifying \(\alpha\), the cutoff value. We use \(\alpha = 0.05\) unless we have a reason not to.

Two sample Hypothesis Test Example

Question: Is there a significant difference between the city fuel efficiency of Italian and German high end cars?

  1. Conduct the t.test to answer this question

    • Note that in t.test command below alternative="two.sided" and conf.level=.95 are the defaults and not required.
    • The defaults are included here so you will no how to change them.
euro_cars <- read_csv("data/Euro_Cars.csv", show_col_types = F)      # import data
itly_cty <- euro_cars$mpg_c[euro_cars$ctry_origin=="Italy"]          # shorten variable names
gmny_cty <- euro_cars$mpg_c[euro_cars$ctry_origin=="Germany"]
t.test(itly_cty, gmny_cty, alternative="two.sided", conf.level=.95)  # conduct test

    Welch Two Sample t-test

data:  itly_cty and gmny_cty
t = -4.1147, df = 23.461, p-value = 0.0004092
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -6.722328 -2.227672
sample estimates:
mean of x mean of y 
   13.400    17.875 

💥 Lecture 18 In-class Exercises - Q2-Q3 💥

  1. Compare p-value to \(\alpha\) and/or examine confidence interval or confidence bounds.
  • Use the output from the t-test to answer these questions:


Question 2: What is the p-value of this hypothesis test?

Round answer to 4 decimal places.


Question 3: Based on this p-value which we compare to \(\alpha = 0.05\) do we reject or fail to reject the null hypothesis?

💥 Lecture 18 In-class Exercises - Q4-Q5 💥

  1. Interpret results in terms of data and question asked.


Question 4: Are the cars from these two countries significantly different with respect to City MPG? If so, which country’s cars has a higher average City MPG?


Question 5: What type of error might we have made?

Relationship between 1-sided and 2-sided Tests

The P-value from our original two-sided test is 0.0004092

Lets rerun the test above as a left-sided test and examine the change in the results.

  • New Hypotheses:

    • \(H_{0}: \mu_{ITA} \geq \mu_{GER}\)

    • \(H_{A}: \mu_{ITA} < \mu_{GER}\)

t.test(itly_cty, gmny_cty, 
       alternative="less", conf.level=.95) 

    Welch Two Sample t-test

data:  itly_cty and gmny_cty
t = -4.1147, df = 23.461, p-value = 0.0002046
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
      -Inf -2.612611
sample estimates:
mean of x mean of y 
   13.400    17.875 
  • One-sided p-value = \(\frac{1}{2}\) of two-sided p-value.

  • If we know we are only interested in one tail, p-value is cut in half

  • Some disciplines ONLY use two-sided tests so that rejecting \(H_{0}\) is more difficult.

  • Other disciplines ONLY use one-sided tests, because direction of the test is obvious.

A Paired t-test Example

  • It is fairly well known that city gas mileage is ALWAYS lower than highway mileage.

  • We can verify that this is true for these European cars using a paired t-test.

  • In a paired t-test, you look at the same individuals (cars, people, sites, etc) under two sets of conditions.

  • A common use of paired t-tests is two collect data on humans before and after some activity such as a scary movie or period of exercise.

  • The key criteria for using a paired t-test is that two sets of data are NOT from independent groups.

  • A paired t-test is a ONE SAMPLE T-TEST of the DIFFERENCES

Comparing City and Highway MPG

Do these European cars get better mileage on the highway?

  1. Specify hypotheses for a right-sided test (Hypotheses have same format as two sample t-tests):

    • \(H_{0}: \mu_{HWY} \leq \mu_{CTY}\)

    • \(H_{0}: \mu_{HWY} \gt \mu_{CTY}\)

  2. Specify \(\alpha = 0.05\)

  3. Conduct paired t-test of these data:

cty <- euro_cars$mpg_c    # save variables to short names
hwy <- euro_cars$mpg_h
t.test(hwy, cty, alternative="greater", conf.level=.95, paired=T) 

    Paired t-test

data:  hwy and cty
t = 16.913, df = 30, p-value < 0.00000000000000022
alternative hypothesis: true mean difference is greater than 0
95 percent confidence interval:
 6.181461      Inf
sample estimates:
mean difference 
       6.870968 

A Closer Look at the Paired t-test

  • A paired t-test is ONLY appropriate when the same individuals are measures twice under different conditions.

  • In order to do a paired t-test, the software creates a new variable,

    • the difference between mpg_h and mpg_c for each car

    • Software (R) does a one sample t-test on those differences.

  • Below are the paired t-test and the one sample t-test on the differences

t.test(hwy, cty, alternative="greater", 
       conf.level=.95, paired=T) 

    Paired t-test

data:  hwy and cty
t = 16.913, df = 30, p-value < 0.00000000000000022
alternative hypothesis: true mean difference is greater than 0
95 percent confidence interval:
 6.181461      Inf
sample estimates:
mean difference 
       6.870968 
mpg_diff <- euro_cars$mpg_h-euro_cars$mpg_c
t.test(mpg_diff, alternative="greater")

    One Sample t-test

data:  mpg_diff
t = 16.913, df = 30, p-value < 0.00000000000000022
alternative hypothesis: true mean is greater than 0
95 percent confidence interval:
 6.181461      Inf
sample estimates:
mean of x 
 6.870968 

💥 Lecture 18 In-class Exercises - Q6 💥

Note that the paired t-test output DOES NOT give the mean for each group.

Instead, the paired t-test output shows the mean difference.


Question 6: What is the average difference in mileage between city driving and highway driving for all of these vehicles?

Round to two decimal places.

Introduction to HW 6

  • For the remainder of class we will look at HW 6.

  • I will also make some short videos but using R will not be the primary challenge for this assignment.

  • This assignment includes many multiple choice questions that require an understanding of

    • what each hypothesis test is testing.
    • how to correctly interpret the results.