For this quiz, you are going to use mpg (miles per galon) dataset. This dataset contains a subset of the fuel economy data that the EPA makes available on http: //fueleconomy.gov. It contains only models which had a new release every year between 1999 and 2008 - this was used as a proxy for the popularity of the car.
The dataset has the following variables:
manufacturer manufacturer namemodel model namedispl engine displacement, in litresyear year of manufacturecyl number of cylinderstrans type of transmissiondrv the type of drive train, where f = front-wheel drive, r = rear wheel drive, 4 = 4wdcty city miles per gallonhwy highway miles per gallonfl fuel typeclass “type” of car# Load the package
library(tidyverse)
# Import data
data(mpg, package="ggplot2")
# Print the first 6 rows
head(mpg)
## # A tibble: 6 x 11
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compa~
## 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compa~
## 3 audi a4 2 2008 4 manual(m6) f 20 31 p compa~
## 4 audi a4 2 2008 4 auto(av) f 21 30 p compa~
## 5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compa~
## 6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compa~
# Get a sense of the dataset
glimpse(mpg)
## Rows: 234
## Columns: 11
## $ manufacturer <chr> "audi", "audi", "audi", "audi", "audi", "audi", "audi"...
## $ model <chr> "a4", "a4", "a4", "a4", "a4", "a4", "a4", "a4 quattro"...
## $ displ <dbl> 1.8, 1.8, 2.0, 2.0, 2.8, 2.8, 3.1, 1.8, 1.8, 2.0, 2.0,...
## $ year <int> 1999, 1999, 2008, 2008, 1999, 1999, 2008, 1999, 1999, ...
## $ cyl <int> 4, 4, 4, 4, 6, 6, 6, 4, 4, 4, 4, 6, 6, 6, 6, 6, 6, 8, ...
## $ trans <chr> "auto(l5)", "manual(m5)", "manual(m6)", "auto(av)", "a...
## $ drv <chr> "f", "f", "f", "f", "f", "f", "f", "4", "4", "4", "4",...
## $ cty <int> 18, 21, 20, 21, 16, 18, 18, 18, 16, 20, 19, 15, 17, 17...
## $ hwy <int> 29, 29, 31, 30, 26, 26, 27, 26, 25, 28, 27, 25, 25, 25...
## $ fl <chr> "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p",...
## $ class <chr> "compact", "compact", "compact", "compact", "compact",...
summary(mpg)
## manufacturer model displ year
## Length:234 Length:234 Min. :1.600 Min. :1999
## Class :character Class :character 1st Qu.:2.400 1st Qu.:1999
## Mode :character Mode :character Median :3.300 Median :2004
## Mean :3.472 Mean :2004
## 3rd Qu.:4.600 3rd Qu.:2008
## Max. :7.000 Max. :2008
## cyl trans drv cty
## Min. :4.000 Length:234 Length:234 Min. : 9.00
## 1st Qu.:4.000 Class :character Class :character 1st Qu.:14.00
## Median :6.000 Mode :character Mode :character Median :17.00
## Mean :5.889 Mean :16.86
## 3rd Qu.:8.000 3rd Qu.:19.00
## Max. :8.000 Max. :35.00
## hwy fl class
## Min. :12.00 Length:234 Length:234
## 1st Qu.:18.00 Class :character Class :character
## Median :24.00 Mode :character Mode :character
## Mean :23.44
## 3rd Qu.:27.00
## Max. :44.00
SEM <- sd(mpg$hwy) / sqrt(234)
SEM
## [1] 0.3892672
The standard error of the mean mpg on highway is 0.389 mpg
sample_mean <- mean(mpg$hwy)
sample_mean
## [1] 23.44017
The mean mpg on highway is 23.44 mpg
upperCI <- sample_mean + 1.96 * SEM
upperCI
## [1] 24.20313
The upper bound is 24.20 mpg
lowerCI <- sample_mean - 1.96 * SEM
lowerCI
## [1] 22.67721
The lower bound is 22.68 mpg
c(lowerCI,upperCI)
## [1] 22.67721 24.20313
The 95% confidence interval is 22.68mpg-24.20mpg
The hypothesis is true because we are 95% confident that the typical car drive between 22.68 to 24.20 miles per gallon on the highway. Since the hypothesis stated that the typical car drives at least 21 miles per gallon on the highway, this means that the hypothesis is correct.
Hint: Inser the code below.
SEM <- sd(mpg$cty) / sqrt(234)
SEM
## [1] 0.2782199
sample_mean <- mean(mpg$cty)
sample_mean
## [1] 16.85897
upperCI <- sample_mean + 1.96 * SEM
upperCI
## [1] 17.40429
lowerCI <- sample_mean - 1.96 * SEM
lowerCI
## [1] 16.31366
c(lowerCI,upperCI)
## [1] 16.31366 17.40429
This hypothesis is untrue because we are 95% confident that typical cars drive between 16.31 and 17.40 miles per gallon in the city. Since the 95% confidence interval shows that some cars do drive under 16.5 miles per gallon in the city, this means that the hypothesis is false.
Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.