df<- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv")
USCars<-df[,1]
JapaneseCars<-df[1:28,2]
qqnorm(df$USCars, main = "Normal probability plot for US Cars", ylab = "US Cars", col = "blue")
qqnorm(df$JapaneseCars, main = "Normal probability plot for Japanese Cars", ylab = "Japanese Cars", col = "red")
Comment - The MPG of Japanese cars appears to be normally distributed compared to the MPG of the US Cars.
boxplot(USCars,JapaneseCars, names = c("US Cars", "Japanese Cars"), main = "Box plot of US Cars vs Japanese Cars")
Comment - From the box plot illustration, we can conclude that there is a huge difference in variance between the US Cars and Japanese Cars. Therefore we take the log for both to approximate equality the variances to perform the Two Sample T-Test.
UCars <- log(USCars)
JCars<- log(JapaneseCars)
qqnorm(UCars, main = "Normal probability plot for US Cars", ylab = "US Cars", col = "blue")
qqnorm(JCars, main = "Normal probability plot for Japanese Cars", ylab = "Japanese Cars", col = "red")
boxplot(UCars,JCars, names = c("US Cars", "Japanese Cars"), main = "Box plot of US Cars vs Japanese Cars")
Comment - From the box plot illustration, we can conclude that the variances are approximately equal after log transformation.
mean(UCars)
## [1] 2.741001
mean(JCars)
## [1] 3.270957
Comment - Sample avg for the log of the MPG of US Cars =
2.741001
Sample avg for the log of the MPG of Japanese Cars =
3.270957
t.test(UCars, JCars, var.equal = TRUE, alternative = c("less"))
##
## Two Sample t-test
##
## data: UCars and JCars
## t = -9.4828, df = 61, p-value = 6.528e-14
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -0.4366143
## sample estimates:
## mean of x mean of y
## 2.741001 3.270957
Comment - Based on our data, the P-value is less than the significance level, hence there is a significant difference between the two samples.
Therefore, we can reject \(H_0\) and can conclude that the mean MPG of cars manufactured in the US is less than that of those manufactured in Japan.
\(H_a: \mu\_1 - \mu\_2 \neq 0\)
df<- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv")
USCars<-df[,1]
JapaneseCars<-df[1:28,2]
qqnorm(df$USCars, main = "Normal probability plot for US Cars", ylab = "US Cars", col = "blue")
qqnorm(df$JapaneseCars, main = "Normal probability plot for Japanese Cars", ylab = "Japanese Cars", col = "red")
boxplot(USCars,JapaneseCars, names = c("US Cars", "Japanese Cars"), main = "Box plot of US Cars vs Japanese Cars")
UCars <- log(USCars)
JCars<- log(JapaneseCars)
qqnorm(UCars, main = "Normal probability plot for US Cars", ylab = "US Cars", col = "blue")
qqnorm(JCars, main = "Normal probability plot for Japanese Cars", ylab = "Japanese Cars", col = "red")
boxplot(UCars,JCars, names = c("US Cars", "Japanese Cars"), main = "Box plot of US Cars vs Japanese Cars")
mean(UCars)
mean(JCars)
t.test(UCars, JCars, var.equal = TRUE, alternative = c("less"))