For this quiz, you are going to use mpg (miles per galon) dataset. This dataset contains a subset of the fuel economy data that the EPA makes available on http: //fueleconomy.gov. It contains only models which had a new release every year between 1999 and 2008 - this was used as a proxy for the popularity of the car.

The dataset has the following variables:

# Load the package
library(tidyverse)

# Import data
data(mpg, package="ggplot2")

# Print the first 6 rows
head(mpg)
## # A tibble: 6 x 11
##   manufacturer model displ  year   cyl trans      drv     cty   hwy fl    class 
##   <chr>        <chr> <dbl> <int> <int> <chr>      <chr> <int> <int> <chr> <chr> 
## 1 audi         a4      1.8  1999     4 auto(l5)   f        18    29 p     compa~
## 2 audi         a4      1.8  1999     4 manual(m5) f        21    29 p     compa~
## 3 audi         a4      2    2008     4 manual(m6) f        20    31 p     compa~
## 4 audi         a4      2    2008     4 auto(av)   f        21    30 p     compa~
## 5 audi         a4      2.8  1999     6 auto(l5)   f        16    26 p     compa~
## 6 audi         a4      2.8  1999     6 manual(m5) f        18    26 p     compa~
# Get a sense of the dataset
glimpse(mpg)
## Rows: 234
## Columns: 11
## $ manufacturer <chr> "audi", "audi", "audi", "audi", "audi", "audi", "audi"...
## $ model        <chr> "a4", "a4", "a4", "a4", "a4", "a4", "a4", "a4 quattro"...
## $ displ        <dbl> 1.8, 1.8, 2.0, 2.0, 2.8, 2.8, 3.1, 1.8, 1.8, 2.0, 2.0,...
## $ year         <int> 1999, 1999, 2008, 2008, 1999, 1999, 2008, 1999, 1999, ...
## $ cyl          <int> 4, 4, 4, 4, 6, 6, 6, 4, 4, 4, 4, 6, 6, 6, 6, 6, 6, 8, ...
## $ trans        <chr> "auto(l5)", "manual(m5)", "manual(m6)", "auto(av)", "a...
## $ drv          <chr> "f", "f", "f", "f", "f", "f", "f", "4", "4", "4", "4",...
## $ cty          <int> 18, 21, 20, 21, 16, 18, 18, 18, 16, 20, 19, 15, 17, 17...
## $ hwy          <int> 29, 29, 31, 30, 26, 26, 27, 26, 25, 28, 27, 25, 25, 25...
## $ fl           <chr> "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p",...
## $ class        <chr> "compact", "compact", "compact", "compact", "compact",...
summary(mpg)
##  manufacturer          model               displ            year     
##  Length:234         Length:234         Min.   :1.600   Min.   :1999  
##  Class :character   Class :character   1st Qu.:2.400   1st Qu.:1999  
##  Mode  :character   Mode  :character   Median :3.300   Median :2004  
##                                        Mean   :3.472   Mean   :2004  
##                                        3rd Qu.:4.600   3rd Qu.:2008  
##                                        Max.   :7.000   Max.   :2008  
##       cyl           trans               drv                 cty       
##  Min.   :4.000   Length:234         Length:234         Min.   : 9.00  
##  1st Qu.:4.000   Class :character   Class :character   1st Qu.:14.00  
##  Median :6.000   Mode  :character   Mode  :character   Median :17.00  
##  Mean   :5.889                                         Mean   :16.86  
##  3rd Qu.:8.000                                         3rd Qu.:19.00  
##  Max.   :8.000                                         Max.   :35.00  
##       hwy             fl               class          
##  Min.   :12.00   Length:234         Length:234        
##  1st Qu.:18.00   Class :character   Class :character  
##  Median :24.00   Mode  :character   Mode  :character  
##  Mean   :23.44                                        
##  3rd Qu.:27.00                                        
##  Max.   :44.00

Q1-Q6 You believe that the typical car today drives 25 miles per gallon (mpg) on highway. You have a sample of 234 cars. Test this hypothesis by answering Q1-Q6.

Q1 Calculate the standard error of the mean mgp on highway.

SEM <- sd(mpg$hwy) / sqrt(234)
SEM
## [1] 0.3892672

Q2 Calculate the sample mean.

sample_mean <- mean(mpg$hwy)
sample_mean
## [1] 23.44017

Q3 Calculate the upper bound of the 95% confidence interval.

upperCI <- sample_mean + sd(mpg$hwy) + SEM
upperCI
## [1] 29.78408

Q4 Calculate the lower bound of the 95% confidence interval.

lowerCI <- sample_mean - sd(mpg$hwy) + SEM
lowerCI
## [1] 17.87479

Q5 Calculate the 95% confidence interval.

c(lowerCI, upperCI)
## [1] 17.87479 29.78408

Q6 What is your conclusion regarding the hypothesis.

In conclusion we are 95% confident that the average mpg usage on the highway is within the range of 17.87479 and 29.78408. So our original hypothesis was in fact true.

Q7 Now that you understand the mpg on highway, you want to test another hypothesis. You believe that the typical car today drives at least 16 miles per gallon on city. To test this hypothesis, repeat Q1-Q6. Discuss your conclusion using the 95% confidence interval.

Hint: Insert the code below.

SEM <- sd(mpg$cty) / sqrt(234)
SEM
## [1] 0.2782199
sample_mean <- mean(mpg$cty)
sample_mean
## [1] 16.85897
upperCI <- sample_mean + sd(mpg$cty) + SEM
upperCI
## [1] 21.39314
lowerCI <- sample_mean - sd(mpg$cty) + SEM
lowerCI
## [1] 12.88125
c(lowerCI, upperCI)
## [1] 12.88125 21.39314

In conclusion our hypothesis was accurate, since we are 95% confident that within the city cars use a range of 12.88125-21.39314 mpg.

Q8 Hide the messages and warnings, but display the code and its results on the webpage.

Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.

Q9 Display the title and your name correctly at the top of the webpage.

Q10 Use the correct slug.