The following problems are copied from the chapter 20 exercises from Introduction to Modern Statistics First Edition by Mine Çetinkaya-Rundel and Johanna Hardin (https://openintro-ims.netlify.app/inference-two-means.html)

The following is a modified version of Question 18 from the book.

Each year the US Environmental Protection Agency (EPA) releases fuel economy data on cars manufactured in a given year. The data for this problem uses information gather by the EPA from 2021 and is stored in the epa2021 data set that is part of the openintro package.

  1. (2 pts.) Look at the help page for the epa2021 data set.
  1. What is the name of the variable for highway mileage?

ANSWER: hwy_mpg

  1. What is the name of the variable for transmission type?

ANSWER: transmission

The epa2021 data st contains more information that what we need for this homework set. I have created a new data set called epa2021_sample that contains a random sample of cars from the original epa2021 data set.

IMPORTANT Use the epa2021_sample data set to answer the questions below.

set.seed(112822)
epa2021_sample <- epa2021 %>% 
  filter(transmission %in% c("A")) %>% 
  sample_n(size=40)

epa2021_sample <- epa2021 %>% 
  filter(transmission %in% c("M")) %>% 
  sample_n(size=30) %>% 
  add_row(epa2021_sample) %>% 
  mutate(transmission = factor(transmission, labels=c("Automatic", "Manual")))
  1. (2 pts.) Calculate summary statistics for highway mileage separated by transmission type.
favstats(epa2021_sample$hwy_mpg~epa2021_sample$transmission)
##   epa2021_sample$transmission min Q1 median    Q3 max     mean       sd  n
## 1                   Automatic  16 20   22.5 27.00  31 23.45000 4.012481 40
## 2                      Manual  21 26   27.5 33.75  40 29.46667 5.380253 30
##   missing
## 1       0
## 2       0
  1. (2 pts.) Construct the following plots to display the distribution of highway mileage separated by transmission type.
  1. Side by side Histograms, Use a binwidth of 3.
epa2021_sample %>%  
  ggplot(aes(x = hwy_mpg, y = ..density..) ) + 
  geom_histogram(col = "lightgray", fill = "navy", binwidth = 3) + 
  geom_density() +
  facet_grid(. ~ transmission) + 
  theme_bw()

  1. Side by side boxplots
epa2021_sample %>%  
  ggplot( aes(y = transmission, x = hwy_mpg )) + 
  geom_boxplot(fill = "navy") + 
  labs(x="Highway Mileage", title="Highway Mileage based on Transmission Type") + 
  theme_bw()

Suppose you wanted to test the hypotheses that the average highway mileage for manual transmission vehicles is higher than the average highway mileage for automatic transmission vehicles.

\(H_0 \space \mu_A = \mu_M\)

\(H_a \space \mu_A < \mu_M\)

  1. (1 pt.) Would it be appropriate to use a randomization distribution to conduct the hypothesis test stated above? Explain your answer.

ANSWER: Yes as one can always do a randomization test.

  1. (1 pt.) Would it be appropriate to use the formula method (t distribution) to conduct the hypothesis test stated above? Explain your answer.

ANSWER: Yes as both sample sizes are greater than or equal to 30

  1. (2 pt.) Suppose you conducted the hypothesis test and you got a p-value of .0032. State the conclusion of your test in context.

ANSWER: We reject the null hypothesis as the p value of .0032 is less than .05. Therefore we conclude that the true average highway mileage for cars with a manual transmission is higher than cars with an automatic transmission.

Date and time completed: Fri Dec 2 09:34:57 2022