dat<-read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv")
colnames(dat)<-c("Uscars","JapaneseCars")
head(dat)
## Uscars JapaneseCars
## 1 18 24
## 2 15 27
## 3 18 27
## 4 16 25
## 5 17 31
## 6 15 35
As we can see that the size of US cars is 35 and the size of Japanese cars is 28.
#QUESTION 1:
qqnorm(dat$JapaneseCars, col="Red", main="Normal probablility plot of mpg for japanese cars")
qqline(dat$JapaneseCars)
The Normal probablility plot for mpg of Japanese appears to be on a fairly straight line
qqnorm(dat$Uscars, col="Blue", main="Normal probablility plot of mpg for Uscars")
qqline(dat$Uscars)
The Normal probability plot for MPG of US Cars appears to be on a fairly straight line.
#QUESTION 2:
boxplot(dat$JapaneseCars,dat$Uscars, names=c("Japanese Cars", "US Cars") , main="MPG Camparison of Japanese Cars with US Cars")
Based on the boxplot the variance does not appear to be constant. and also we can see potential outliers in US cars only.
#QUESTION 3:
dat2<-log(dat)
qqnorm(dat2$JapaneseCars, col="Red", main="Normal probablility plot of mpg for japanese cars")
qqline(dat2$JapaneseCars)
The Normal probability plot for MPG of Japanese appears to be on a better straight line.
qqnorm(dat2$Uscars, col="Blue", main="Normal probablility plot of mpg for Uscars")
qqline(dat2$Uscars)
The Normal probability plot for MPG of US Cars appears to be on a better straight line.
boxplot(dat2$JapaneseCars,dat2$Uscars, names=c("Japanese Cars", "US Cars") , main="MPG Camparison of Japanese Cars with US Cars")
based on the box plot after performing log transformation on both samples MPG of US cars and Japanese cars we can see now that the variance of both samples are now reasonably close. And also the qq line after the log transformation of both the samples of US Cars and Japanese Cars now falls on a better straight line than previous qqnorm line without log transformation.
# QUESTION 4:
t.test(dat2$Uscars,dat2$JapaneseCars, var.equal = TRUE,alternative = "less")
##
## Two Sample t-test
##
## data: dat2$Uscars and dat2$JapaneseCars
## t = -9.4828, df = 61, p-value = 6.528e-14
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -0.4366143
## sample estimates:
## mean of x mean of y
## 2.741001 3.270957
Let samples with u1=MPG of Japanese cars and let samples with u2= MPG of Us Cars stating the null hypothesis H0: u1= u2 and stating alternative hypothesis Ha: u1<u2 or u1>u2
Since the p value is 6.528e-14 which is less than our reference level of significance(0.05).
This indicates there is a significant difference between mpg of Japanese cars and mpg of Us cars
The sample average for the log of the MPG of US cars is 2.741001 and the sample average for the log of the MPG of Japanese Cars is 3.270957
Conclusion: based on the above p value of we reject null hypothesis stating that the mean mpg of cars manufactured in US is less than cars manufactured in japan.