This report captures work done for the 4th flipped assignment, comparing mean mpg rates for cars manufactured in the US vs. Japan. R code along with the results are provided. The approach taken was to mimic the approach requested in the assignments for 2 sample t-tests:

  1. Test for normalcy
  2. Compare for equal variances
  3. Transform the data and repeating the previous process
  4. Proceed with a hypothesis test
  5. State the conclusions.
# setup Libraries
library(knitr)
library(dplyr)
library(lawstat)
library(printr)
# read in file
d <- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv",
             sep=",",
             na.strings = c("","."))
# clean up first col name
names(d)[1] <- "USCars"

1 Normal Probability Plots for the MPG Values of Cars

Does the mpg of both US cars and Japanese cars appear to be Normally distributed (use NPPs)?

The following plots show how well the cars manufactured in the US and Japan fit a normal distribution. Dots closer to the line indicate a value fitting closer to the normal curve.

1.1 MPG for Cars Manufactured in the US

qqnorm(d$USCars,
       main = "Normal Probability Plot for US Cars",
       pch = 21, 
       cex = 2,
       col = "black",
       bg = "Blue",
       lwd = 2)
qqline(d$USCars, lwd = 2)

1.2 MPG for Cars Manufactured in Japan

qqnorm(d$JapaneseCars,
       main = "Normal Probability Plot for Japan Cars",
       pch = 21,
       cex = 2,
       col = "black",
       bg = "Red",
       lwd = 2)
qqline(d$JapaneseCars, lwd = 2)

As one can see, the mpg for US cars doesn’t appear to be Normally distributed, while the mpg for Japanese cars does appear to be Normally distributed.

2 Visual Comparison of Variances (US vs Japan Car MPG)

Does the variance appear to be constant (use side-by-side boxplots)?

The following side-by-side boxplot compares the variances of cars manufactured in the US and Japan. The “taller” the box, the greater the inter-quartile range and thus, the larger the variance.

2.1 Side-by-side Boxplot

boxplot(d$USCars,
        d$JapaneseCars,
        main = "Box Plot of Cars",
        names = c("US","Japan"),
        col = c("Blue","Red"))

The interquartile range, a measure of variance, for the US MPG is 4, while the interquartile range for Japanese MPG is 7. With the Japanese interquartile range being 75% larger than the US range, the variances do not visually appear to be constant between the two countries. It should be noted that a statistical test such as Levene’s test is recommended to draw a more accurate conclusion from the data.

3 Transform the data and repeat the previous process

After transforming the data using a log transformation, what differences are seen in the plots?

3.1 Code for Conducting a Log Transofmration

d2 <- log10(d)

Now we can re-run the original normality plots and box plots using the transformed data.

3.2 Log Transformed MPG for Cars Manufactured in the US

qqnorm(d2$USCars, 
        main = "Normal Probability Plot for US Cars, log10",
       pch = 21,
       cex = 2,
       col = "black",
       bg = "Blue",
       lwd = 2)
qqline(d2$USCars,lwd = 2)

3.3 Log Transformed MPG for Cars Manufactured in Japan

qqnorm(d2$JapaneseCars,
       main = "Normal Probability Plot for Japan Cars, log10",
       pch = 21,
       cex = 2,
       col = "black",
       bg = "Red",
       lwd = 2)
qqline(d2$JapaneseCars, lwd = 2)

3.4 Side-by-side Boxplot of Log-Transformed Data

boxplot(d2$USCars,
        d2$JapaneseCars,
        main = "Box Plot of Cars, log10",
        names = c("US","Japan"),
        col = c("Blue","Red"))

The interquartile range, for the US MPG after the Log transformation is \(\approx\).109, while the interquartile range for Japanese MPG after the Log transformation is \(\approx\).111. The transformation has reduced the difference in variances significantly and we can now consider the variances to be consistent with each other.

4 Hypothesis Testing for Log Transformed MPG Data

What is the null and alternative hypothesis and test using a 0.05 level of significance.
      a. What are the sample averages for the log of the mpg of US and Japanese cars?

      b. What are your conclusions?

The null and alternative hypotheses are tested at the 0.05 significance level:
      H0:μ1 = μ2

      Ha:μ1 μ2

      where μ1 = Log10 of US MPG and μ2 = Log10 of Japan MPG

4.1 T-test using pooled variances

The code and results are provided below.

LogUS_MPG <- d2$USCars
LogJapan_MPG <-  d2$JapaneseCars
t.test(LogUS_MPG, LogJapan_MPG, var.equal=TRUE)
## 
##  Two Sample t-test
## 
## data:  LogUS_MPG and LogJapan_MPG
## t = -9.4828, df = 61, p-value = 1.306e-13
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.2786895 -0.1816243
## sample estimates:
## mean of x mean of y 
##  1.190402  1.420559

Thus, the output above shows that the p-value \(\approx\) 0, which is below our threshold of 0.05. We reject the Null hypothesis and conclude that there is a statistically significant difference in the mean MPG between cars manufactured in the US vs. Japan.

5 Source Code

# setup Libraries
library(knitr)
library(dplyr)
library(lawstat)
library(printr)
# read in file
d <- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv",
             sep=",",
             na.strings = c("","."))
# clean up first col name
names(d)[1] <- "USCars"

qqnorm(d$USCars,
       main = "Normal Probability Plot for US Cars",
       pch = 21, 
       cex = 2,
       col = "black",
       bg = "Blue",
       lwd = 2)
qqline(d$USCars, lwd = 2)

qqnorm(d$JapaneseCars,
       main = "Normal Probability Plot for Japan Cars",
       pch = 21,
       cex = 2,
       col = "black",
       bg = "Red",
       lwd = 2)
qqline(d$JapaneseCars, lwd = 2)

boxplot(d$USCars,
        d$JapaneseCars,
        main = "Box Plot of Cars",
        names = c("US","Japan"),
        col = c("Blue","Red"))

d2 <- log10(d)

qqnorm(d2$USCars, 
        main = "Normal Probability Plot for US Cars, log10",
       pch = 21,
       cex = 2,
       col = "black",
       bg = "Blue",
       lwd = 2)
qqline(d2$USCars,lwd = 2)

qqnorm(d2$JapaneseCars,
       main = "Normal Probability Plot for Japan Cars, log10",
       pch = 21,
       cex = 2,
       col = "black",
       bg = "Red",
       lwd = 2)
qqline(d2$JapaneseCars, lwd = 2)

boxplot(d2$USCars,
        d2$JapaneseCars,
        main = "Box Plot of Cars, log10",
        names = c("US","Japan"),
        col = c("Blue","Red"))

LogUS_MPG <- d2$USCars
LogJapan_MPG <-  d2$JapaneseCars
t.test(LogUS_MPG, LogJapan_MPG, var.equal=TRUE)

6 Raw Data

Raw data for MPG of US and Japanese cars.

https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv

MPG of US and Japanese cars
USCars JapaneseCars
18 24
15 27
18 27
16 25
17 31
15 35
14 24
14 19
14 28
15 23
15 27
14 20
15 22
14 18
22 20
18 31
21 32
21 31
10 32
10 24
11 26
9 29
28 24
25 24
19 33
16 33
17 32
19 28
18 NA
14 NA
14 NA
14 NA
14 NA
12 NA
13 NA