I am going to perform a simple hypothesis testing to see if there is a difference in city miles per gallon between regular and premium fuel types.
First, we need to use dplyr to manipulate data.
library(dplyr)
I am using the mpg data that is under ggplot2 package. Using head function, we can see the first 6 rows.
mpg <- ggplot2::mpg
head(mpg)
## # A tibble: 6 x 11
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 audi a4 1.8 1999 4 auto(~ f 18 29 p comp~
## 2 audi a4 1.8 1999 4 manua~ f 21 29 p comp~
## 3 audi a4 2 2008 4 manua~ f 20 31 p comp~
## 4 audi a4 2 2008 4 auto(~ f 21 30 p comp~
## 5 audi a4 2.8 1999 6 auto(~ f 16 26 p comp~
## 6 audi a4 2.8 1999 6 manua~ f 18 26 p comp~
I am selecting only the necessary columns and filtered so that only regular and premium fuel types remain.
test <- mpg %>%
select(fl, cty) %>%
filter(fl %in% c("r", "p"))
Now, we can check the outcome of data preparation below. The table function will show how many records are there for each fuel type.
head(test)
## # A tibble: 6 x 2
## fl cty
## <chr> <int>
## 1 p 18
## 2 p 21
## 3 p 20
## 4 p 21
## 5 p 16
## 6 p 18
table(test$fl)
##
## p r
## 52 168
Now, here is the t-test.
t.test(data = test, cty ~ fl, var.equal = TRUE)
##
## Two Sample t-test
##
## data: cty by fl
## t = 1.0662, df = 218, p-value = 0.2875
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.5322946 1.7868733
## sample estimates:
## mean in group p mean in group r
## 17.36538 16.73810
From the test, we can not say that there is a difference in city miles per gallon between regular and premium fuel types.
The p-value is higher than 0.05 and the result is telling us that there is 28.75% chance that we got this result by chance, though actually there is no difference.
Again, there is a high chance that the difference between the mean of the two fuel types happened by chance.