Purpose

I am going to perform a simple hypothesis testing to see if there is a difference in city miles per gallon between regular and premium fuel types.

Data Preparation

First, we need to use dplyr to manipulate data.

library(dplyr)


I am using the mpg data that is under ggplot2 package. Using head function, we can see the first 6 rows.

mpg <- ggplot2::mpg
head(mpg)
## # A tibble: 6 x 11
##   manufacturer model displ  year   cyl trans  drv     cty   hwy fl    class
##   <chr>        <chr> <dbl> <int> <int> <chr>  <chr> <int> <int> <chr> <chr>
## 1 audi         a4      1.8  1999     4 auto(~ f        18    29 p     comp~
## 2 audi         a4      1.8  1999     4 manua~ f        21    29 p     comp~
## 3 audi         a4      2    2008     4 manua~ f        20    31 p     comp~
## 4 audi         a4      2    2008     4 auto(~ f        21    30 p     comp~
## 5 audi         a4      2.8  1999     6 auto(~ f        16    26 p     comp~
## 6 audi         a4      2.8  1999     6 manua~ f        18    26 p     comp~


I am selecting only the necessary columns and filtered so that only regular and premium fuel types remain.

test <- mpg %>%
  select(fl, cty) %>%
  filter(fl %in% c("r", "p"))


Now, we can check the outcome of data preparation below. The table function will show how many records are there for each fuel type.

head(test)
## # A tibble: 6 x 2
##   fl      cty
##   <chr> <int>
## 1 p        18
## 2 p        21
## 3 p        20
## 4 p        21
## 5 p        16
## 6 p        18
table(test$fl)
## 
##   p   r 
##  52 168

t-Test

Now, here is the t-test.

t.test(data = test, cty ~ fl, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  cty by fl
## t = 1.0662, df = 218, p-value = 0.2875
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.5322946  1.7868733
## sample estimates:
## mean in group p mean in group r 
##        17.36538        16.73810

From the test, we can not say that there is a difference in city miles per gallon between regular and premium fuel types.

The p-value is higher than 0.05 and the result is telling us that there is 28.75% chance that we got this result by chance, though actually there is no difference.

Again, there is a high chance that the difference between the mean of the two fuel types happened by chance.