For this quiz, you are going to use mpg (miles per galon) dataset. This dataset contains a subset of the fuel economy data that the EPA makes available on http: //fueleconomy.gov. It contains only models which had a new release every year between 1999 and 2008 - this was used as a proxy for the popularity of the car.
The dataset has the following variables:
manufacturer manufacturer namemodel model namedispl engine displacement, in litresyear year of manufacturecyl number of cylinderstrans type of transmissiondrv the type of drive train, where f = front-wheel drive, r = rear wheel drive, 4 = 4wdcty city miles per gallonhwy highway miles per gallonfl fuel typeclass “type” of car# Load the package
library(tidyverse)
## -- Attaching packages -------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2 v purrr 0.3.4
## v tibble 3.0.3 v dplyr 1.0.2
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0
## -- Conflicts ----------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
# Import data
data(mpg, package="ggplot2")
# Print the first 6 rows
head(mpg)
## # A tibble: 6 x 11
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compa~
## 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compa~
## 3 audi a4 2 2008 4 manual(m6) f 20 31 p compa~
## 4 audi a4 2 2008 4 auto(av) f 21 30 p compa~
## 5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compa~
## 6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compa~
# Get a sense of the dataset
glimpse(mpg)
## Rows: 234
## Columns: 11
## $ manufacturer <chr> "audi", "audi", "audi", "audi", "audi", "audi", "audi"...
## $ model <chr> "a4", "a4", "a4", "a4", "a4", "a4", "a4", "a4 quattro"...
## $ displ <dbl> 1.8, 1.8, 2.0, 2.0, 2.8, 2.8, 3.1, 1.8, 1.8, 2.0, 2.0,...
## $ year <int> 1999, 1999, 2008, 2008, 1999, 1999, 2008, 1999, 1999, ...
## $ cyl <int> 4, 4, 4, 4, 6, 6, 6, 4, 4, 4, 4, 6, 6, 6, 6, 6, 6, 8, ...
## $ trans <chr> "auto(l5)", "manual(m5)", "manual(m6)", "auto(av)", "a...
## $ drv <chr> "f", "f", "f", "f", "f", "f", "f", "4", "4", "4", "4",...
## $ cty <int> 18, 21, 20, 21, 16, 18, 18, 18, 16, 20, 19, 15, 17, 17...
## $ hwy <int> 29, 29, 31, 30, 26, 26, 27, 26, 25, 28, 27, 25, 25, 25...
## $ fl <chr> "p", "p", "p", "p", "p", "p", "p", "p", "p", "p", "p",...
## $ class <chr> "compact", "compact", "compact", "compact", "compact",...
summary(mpg)
## manufacturer model displ year
## Length:234 Length:234 Min. :1.600 Min. :1999
## Class :character Class :character 1st Qu.:2.400 1st Qu.:1999
## Mode :character Mode :character Median :3.300 Median :2004
## Mean :3.472 Mean :2004
## 3rd Qu.:4.600 3rd Qu.:2008
## Max. :7.000 Max. :2008
## cyl trans drv cty
## Min. :4.000 Length:234 Length:234 Min. : 9.00
## 1st Qu.:4.000 Class :character Class :character 1st Qu.:14.00
## Median :6.000 Mode :character Mode :character Median :17.00
## Mean :5.889 Mean :16.86
## 3rd Qu.:8.000 3rd Qu.:19.00
## Max. :8.000 Max. :35.00
## hwy fl class
## Min. :12.00 Length:234 Length:234
## 1st Qu.:18.00 Class :character Class :character
## Median :24.00 Mode :character Mode :character
## Mean :23.44
## 3rd Qu.:27.00
## Max. :44.00
SEM <- sd(mpg$hwy) / sqrt(234)
SEM
## [1] 0.3892672
sample_mean <- mean(mpg$hwy)
sample_mean
## [1] 23.44017
upperCI <- sample_mean + 1.96 * SEM
upperCI
## [1] 24.20313
lowerCI <- sample_mean - 1.96 * SEM
lowerCI
## [1] 22.67721
c(lowerCI, upperCI)
## [1] 22.67721 24.20313
Based off of the results the hypothesis was correct as the hypothesis stated that the typical MPG for cars today was at least 21 MPG and the results show that the average is minimal average is 22.68 MPG and the interval amount is 24.20 MPG.
Hint: Inser the code below.
SEM <- sd(mpg$cty) / sqrt(234)
SEM
## [1] 0.2782199
sample_mean <- mean(mpg$cty)
sample_mean
## [1] 16.85897
upperCI <- sample_mean + 1.96 *SEM
lowerCI <- sample_mean - 1.96 *SEM
c(lowerCI, upperCI)
## [1] 16.31366 17.40429
The hypothesis is close but inacurate being around 0.2 MPG off; the hypothesis state an average of at least 16.5 MPG however, the lowest amount is 16.31.
Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.