Background

An environmental group would like to test the hypothesis that the mean mpg of cars manufactured in the US is less than that of those manufactured in Japan. Towards this end, they sampled n1=35 US and n2=28 Japanese cars, which were tested for mpg fuel efficiency. (As a caveat, assume that this is a random sample from a large population of US and Japanese cars, not a complete census). The data is reported in the following file csv file

https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv

# Load Data from provided URL
url <- "https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv"
dat <- read.csv(url)

Question #1

Does the mpg of both US cars and Japanese cars appear to be Normally distributed (use NPPs)?

We can pull the Q-Q plot for each US and Japanese made cars and see if each appear to be normally distributed.

qqnorm(dat$USCars, main="US Cars Milage Per Gallon (MPG)")
qqline(dat$USCars, col="blue")

qqnorm(dat$JapaneseCar, main="Japanese Cars Mileage Per Gallon (MPG)")
qqline(dat$JapaneseCars, col="green")

Answer:

From the first of the two plots, the US cars mpg does not seem to be normally distributed and shows noticeable deviations near the tails. From the second of the two plots, the Japanese cars mpg does seem to be reasonably normally distributed.

##Question #2 Does the variance appear to be constant (use side-by-side boxplots)?

We can create a side-by-side boxplot to look at this:

UScars <- dat$USCars
Japanesecars <-dat$JapaneseCars
boxplot(UScars,Japanesecars,
        names = c("US Cars","Japanese Cars"),
        col = c("blue","green"),
        main = "Car Mile Per Gallon (MPG)",
        ylab = "Mile Per Gallon (MPG)"
        )

Answer

The variance on Japanese cars appears to be constant, but the variance of US cars is not constant. There is tight clustering around the median of the Japanese cars. The variance between the two is no equal and the US cars show a greater variability.

##Question #3 Transform the data using a log transform and repeat questions 1 and 2. Comment on the differences between the plots. Use the transformed data for the remaining questions.

Lets transform the data using the log function and look at the Q-Q and box plots again.

dat$LogUSCars <- log(dat$USCars)
dat$LogJapaneseCars <- log(dat$JapaneseCars)

qqnorm(dat$LogUSCars, main="US Cars Milage Per Gallon (MPG)(Log)")
qqline(dat$LogUSCars, col="blue")

qqnorm(dat$LogJapaneseCars, main="Japanese Cars Mileage Per Gallon (MPG)(Log)")
qqline(dat$LogJapaneseCars, col="green")

boxplot(dat$LogUSCars,dat$LogJapaneseCars,
        names = c("US Cars","Japanese Cars"),
        col = c("blue","green"),
        main = "Car Mile Per Gallon (MPG)(Log)",
        ylab = "Log Mile Per Gallon (MPG)"
        )

Transforming the data using the log function helped to normalize the US Car data better. When comparing the US Car data to the original normalized plot, with the data transformed, the tails now are closer to the normalized line. We can also see a change towards more normalized when look at the Us Car data in the boxplot. The Japanese Car data has remained normalized when using the log transformation.The variance between the two has improved due to the log transformation.

##Question #4 State the null and alternative hypothesis and test using a 0.05 level of significance.

a. What are the sample averages for the log of the mpg of US and Japanese cars?

b. State your conclusions

We can start by running a 2 sample T-Test with the null hypothesis that the mean log of gas mileage of US Cars and Japanese Cars are equal. The alternative hypothesis is that the mean log of gas mileage of US Cars is less than that of Japanese Cars.

Null Hypothesis: \[H_0:\mu_{US} = \mu_{Japanese}\] Alternative Hypothesis: \[H_1:\mu_{US} < \mu_{Japanese}\]

t.test(dat$LogUSCars,dat$LogJapaneseCars,alternative = "less",var.equal = TRUE,conf.level = 0.95)
## 
##  Two Sample t-test
## 
## data:  dat$LogUSCars and dat$LogJapaneseCars
## t = -9.4828, df = 61, p-value = 6.528e-14
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##        -Inf -0.4366143
## sample estimates:
## mean of x mean of y 
##  2.741001  3.270957

Answer

From the t-test, the sample averages are as follows:

\[ AVG_{US Cars} = 2.74\\ AVG_{Japanese Cars} = 3.27 \]

From the t-test p-value being much less than 0.05, we will reject the null hypothesis and can conclude that the US Cars have a lower fuel efficiency. To further back this conclusion, it can be seen in the earlier box plot that the US Cars had a lower efficiency than the Japanese Cars.