knitr::opts_chunk$set(echo = TRUE)
This assignment analyzes a set of data between the gas mileage of US cars and Japanese Cars. The goal was to determine whether or not the mean gas mileage of US cars is less than that of Japanese cars.
Below is the data analyzed.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
data <- data.frame(read.csv("C:/Users/cephusm/Downloads/US_Japanese_Cars.csv"))
rmarkdown::paged_table(data)
Below are the Normal Probability Plots of the US and Japanese Cars.
#US Cars
usc <- data$USCars
#US Normality
qqnorm(usc, main="US Cars", ylab="MPG")
qqline(usc, col="red")
#Japanese Cars
jc <- data$JapaneseCars
#Japanese Normality
qqnorm(jc, main="Japanese Cars", ylab="MPG")
qqline(jc, col="blue")
Looking at the plots above one can conclude the the data sets are functionally normal. Maybe the US plot at the ends is a little skewed.
boxplot(usc,jc, main="US vs. Japanese Variance", ylab="MPG")
uscq <- IQR(usc)
jcq <- IQR(jc, na.rm = TRUE)
var <- abs(uscq - jcq)
idata <- data.frame(uscq,jcq,var)
colnames(idata) <- c("US IQR","J IQR","Variance")
rmarkdown::paged_table(idata)
The above box plots and interquartile (IQR) ranges of the US and Japanese (J) cars show that there is some significant variance. The interquartile range of the US cars is bigger than the Japanese cars. Subtracting their interquartile ranges shows the significant variance.
Since there is significant variance, the log of the data was taken to make things more stable.
#US
lusc <- log(usc)
qqnorm(lusc, main="Log US Cars", ylab="MPG")
qqline(lusc, col="red")
#Japanese
ljc <- log(jc)
qqnorm(ljc, main="Log Japanese Cars", ylab="MPG")
qqline(ljc, col="blue")
The data is still normal and in fact, the data fits better towards the best fit line.
boxplot(lusc,ljc, main="Log US vs. Japanese Variance", ylab="MPG")
luscq <- IQR(lusc)
ljcq <- IQR(ljc, na.rm = TRUE)
lvar <- abs(luscq - ljcq)
ldata <- data.frame(luscq,ljcq,lvar)
colnames(ldata) <- c("Log US IQR","Log J IQR","Log Variance")
rmarkdown::paged_table(ldata)
Taking the log of the data greatly reduced the variance between the data. The sizes of the interquartile (IQR) ranges are almost identical now and the difference between the two ranges went from 3 to ~0.0046.
Once the data was normal and with constant variance. The T-Test was performed to test the null and alternative hypothesis and see the sample mean of the US and Japanese cars.
t.test(lusc,ljc,var.equal = TRUE)
##
## Two Sample t-test
##
## data: lusc and ljc
## t = -9.4828, df = 61, p-value = 1.306e-13
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.6417062 -0.4182053
## sample estimates:
## mean of x mean of y
## 2.741001 3.270957
The p-value is less than a 0.05 level of significance. Therefore we can reject the null hypothesis and assume that the alternative hypothesis is true. The data does has statistical significance.
After adjusting the data to account for the variance and using the T-Test to prove statistical significance, a conclusion can be drawn about the means of the US and Japanese gas mileage data. The mean of the US gas mileage is less than the mean of the Japanese gas mileage. Sooooooooo maybe by a US car to save on gas? lol 😆