1 Assignment

An environmental group would like to test the hypothesis that the mean mpg of cars manufactured in the US is less than that of those manufactured in Japan.  Towards this end, they sampled n1=35 US and n2=28 Japanese cars, which were tested for mpg fuel efficiency.  (As a caveat, assume that this is a random sample from a large population of US and Japanese cars, not a complete census).  The data is reported in the following file csv file

https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv

1.      Does the mpg of both US cars and Japanese cars appear to be Normally distributed (use NPPs)?

2.      Does the variance appear to be constant (use side-by-side boxplots)?

3.      Transform the data using a log transform and repeat questions 1 and 2.  Comment on the differences between the plots. Use the transformed data for the remaining questions

4.      State the null and alternative hypothesis and test using a 0.05 level of significance.

a.      What are the sample averages for the log of the mpg of US and Japanese cars?

b.      State your conclusions

Submit both a rmd file and link to your html file in blackboard. 

1.1 - Question 1

1.      Does the mpg of both US cars and Japanese cars appear to be Normally distributed (use NPPs)?

Based on the normal distribution plots on US Cars and Japanese Cars, the mpg of both do not appear to be normally distributed.

#This assigns the data to dataset1
datset1<-read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv")
#This plots the normal distribution for US Cars
qqnorm(datset1$USCars, main = "US Cars")  
qqline(datset1$USCars, main = "US Cars")

qqnorm(datset1$JapaneseCars, main= "Japanese Cars")
qqline(datset1$JapaneseCars, main= "Japanese Cars")

1.2 - Question 2

2.      Does the variance appear to be constant (use side-by-side boxplots)?

The variance does not appear to be constant. The height of the box-plots is not the same

boxplot(datset1$USCars,datset1$JapaneseCars, main="Boxplot of Data")

1.3 - Question 3

3.      Transform the data using a log transform and repeat questions 1 and 2.  Comment on the differences between the plots. Use the transformed data for the remaining questions

The transformed data seems to be now more normally distributed than before. The height of the box plots looks similar

datatransform<-log10(datset1)
#This is the code for the US cars - Transformed
qqnorm(datatransform$USCars, main = "US CARS - Log Transform")  
qqline(datatransform$USCars, main = "US CARS - Log Transform")

#This is the code for the Japanese cars - Transformed
qqnorm(datatransform$JapaneseCars, main = "US CARS - Log Transform")  
qqline(datatransform$JapaneseCars, main = "US CARS - Log Transform")

#This is the code for the boxplots of US Cars vs Japanese cars - Transformed
boxplot(datatransform$USCars,datatransform$JapaneseCars,main= "Boxplot of transformed Data")

1.4 - Question 4

4.      State the null and alternative hypothesis and test using a 0.05 level of significance.

a.      What are the sample averages for the log of the mpg of US and Japanese cars?

A - mean of mpg of US Cars=1.190402 / mean of mpg of Japanese cars =1.420559

b.      State your conclusions

The p-value shows a value of 1.306e-13, which yields a high confidense rate in our result.

t.test(datatransform$USCars, datatransform$JapaneseCars,var.equal=TRUE)
## 
##  Two Sample t-test
## 
## data:  datatransform$USCars and datatransform$JapaneseCars
## t = -9.4828, df = 61, p-value = 1.306e-13
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.2786895 -0.1816243
## sample estimates:
## mean of x mean of y 
##  1.190402  1.420559

2 Complete R Code

datset1<-read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv")
qqnorm(datset1$USCars, main = "US CARS")  
qqline(datset1$USCars, main = "US CARS")

qqnorm(datset1$JapaneseCars, main= "Japanese cars")
qqline(datset1$JapaneseCars, main= "Japanese cars")
#Q1 - Somewhat normally distributed


#not a large difference in variance (even through we know that the mean and variance of Poisson are equal)
boxplot(datset1$USCars,datset1$JapaneseCars, main="Boxplot of Data")
#Q2 - It does not

#Q3 - Transforming Data
datatransform<-log10(datset1)
qqnorm(datatransform$USCars, main = "US CARS - Log Transform")  
qqline(datatransform$USCars, main = "US CARS - Log Transform")

qqnorm(datatransform$JapaneseCars, main= "Japanese cars - Log Transform")
qqline(datatransform$JapaneseCars, main= "Japanese cars - Log Transform")

#not a large difference in variance (even through we know that the mean and variance of Poisson are equal)
boxplot(datatransform$USCars,datatransform$JapaneseCars,main= "Boxplot of transformed Data")

#Q4 
#A - mean of x=1.190402 /  mean of y=1.420559
t.test(datatransform$USCars, datatransform$JapaneseCars,var.equal=TRUE)
# < .05, then it is a confident t-test (which it is, 1.3e-13)
#mean of x=1.190402 /  mean of y=1.420559