This report captures work done for the 4th flipped assignment, comparing mean mpg rates for cars manufactured in the US vs. Japan. R code along with the results are provided. The approach taken was to mimic the approach requested in the assignments for 2 sample t-tests:

Test for normalcy
Compare for equal variances
Transform the data and repeating the previous process
Proceed with a hypothesis test
State the conclusions.

# setup Libraries
library(knitr)
library(dplyr)
library(lawstat)
library(printr)
# read in file
d <- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv",
             sep=",",
             na.strings = c("","."))
# clean up first col name
names(d)[1] <- "USCars"

1 Normal Probability Plots for the MPG Values of Cars

Does the mpg of both US cars and Japanese cars appear to be Normally distributed (use NPPs)?

The following plots show how well the cars manufactured in the US and Japan fit a normal distribution. Dots closer to the line indicate a value fitting closer to the normal curve.

1.1 MPG for Cars Manufactured in the US

qqnorm(d$USCars,
       main = "Normal Probability Plot for US Cars",
       pch = 21, 
       cex = 2,
       col = "black",
       bg = "Blue",
       lwd = 2)
qqline(d$USCars, lwd = 2)

1.2 MPG for Cars Manufactured in Japan

qqnorm(d$JapaneseCars,
       main = "Normal Probability Plot for Japan Cars",
       pch = 21,
       cex = 2,
       col = "black",
       bg = "Red",
       lwd = 2)
qqline(d$JapaneseCars, lwd = 2)

As one can see, the mpg for US cars doesn’t appear to be Normally distributed, while the mpg for Japanese cars does appear to be Normally distributed.

2 Visual Comparison of Variances (US vs Japan Car MPG)

Does the variance appear to be constant (use side-by-side boxplots)?

The following side-by-side boxplot compares the variances of cars manufactured in the US and Japan. The “taller” the box, the greater the inter-quartile range and thus, the larger the variance.

2.1 Side-by-side Boxplot

boxplot(d$USCars,
        d$JapaneseCars,
        main = "Box Plot of Cars",
        names = c("US","Japan"),
        col = c("Blue","Red"))

The interquartile range, a measure of variance, for the US MPG is 4, while the interquartile range for Japanese MPG is 7. With the Japanese interquartile range being 75% larger than the US range, the variances do not visually appear to be constant between the two countries. It should be noted that a statistical test such as Levene’s test is recommended to draw a more accurate conclusion from the data.

3 Transform the data and repeat the previous process

After transforming the data using a log transformation, what differences are seen in the plots?

3.1 Code for Conducting a Log Transofmration

d2 <- log10(d)

Now we can re-run the original normality plots and box plots using the transformed data.

3.2 Log Transformed MPG for Cars Manufactured in the US

qqnorm(d2$USCars, 
        main = "Normal Probability Plot for US Cars, log10",
       pch = 21,
       cex = 2,
       col = "black",
       bg = "Blue",
       lwd = 2)
qqline(d2$USCars,lwd = 2)

3.3 Log Transformed MPG for Cars Manufactured in Japan

qqnorm(d2$JapaneseCars,
       main = "Normal Probability Plot for Japan Cars, log10",
       pch = 21,
       cex = 2,
       col = "black",
       bg = "Red",
       lwd = 2)
qqline(d2$JapaneseCars, lwd = 2)

3.4 Side-by-side Boxplot of Log-Transformed Data

boxplot(d2$USCars,
        d2$JapaneseCars,
        main = "Box Plot of Cars, log10",
        names = c("US","Japan"),
        col = c("Blue","Red"))

The interquartile range, for the US MPG after the Log transformation is \(\approx\).109, while the interquartile range for Japanese MPG after the Log transformation is \(\approx\).111. The transformation has reduced the difference in variances significantly and we can now consider the variances to be consistent with each other.

4 Hypothesis Testing for Log Transformed MPG Data

What is the null and alternative hypothesis and test using a 0.05 level of significance.
a. What are the sample averages for the log of the mpg of US and Japanese cars?
b. What are your conclusions?

The null and alternative hypotheses are tested at the 0.05 significance level:
      H₀:μ₁ = μ₂
      H_a:μ₁ ≠ μ₂
      where μ₁ = Log10 of US MPG and μ₂ = Log10 of Japan MPG

4.1 T-test using pooled variances

The code and results are provided below.

LogUS_MPG <- d2$USCars
LogJapan_MPG <-  d2$JapaneseCars
t.test(LogUS_MPG, LogJapan_MPG, var.equal=TRUE)

## 
##  Two Sample t-test
## 
## data:  LogUS_MPG and LogJapan_MPG
## t = -9.4828, df = 61, p-value = 1.306e-13
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.2786895 -0.1816243
## sample estimates:
## mean of x mean of y 
##  1.190402  1.420559

Thus, the output above shows that the p-value \(\approx\) 0, which is below our threshold of 0.05. We reject the Null hypothesis and conclude that there is a statistically significant difference in the mean MPG between cars manufactured in the US vs. Japan.

5 Source Code

# setup Libraries
library(knitr)
library(dplyr)
library(lawstat)
library(printr)
# read in file
d <- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv",
             sep=",",
             na.strings = c("","."))
# clean up first col name
names(d)[1] <- "USCars"

qqnorm(d$USCars,
       main = "Normal Probability Plot for US Cars",
       pch = 21, 
       cex = 2,
       col = "black",
       bg = "Blue",
       lwd = 2)
qqline(d$USCars, lwd = 2)

qqnorm(d$JapaneseCars,
       main = "Normal Probability Plot for Japan Cars",
       pch = 21,
       cex = 2,
       col = "black",
       bg = "Red",
       lwd = 2)
qqline(d$JapaneseCars, lwd = 2)

boxplot(d$USCars,
        d$JapaneseCars,
        main = "Box Plot of Cars",
        names = c("US","Japan"),
        col = c("Blue","Red"))

d2 <- log10(d)

qqnorm(d2$USCars, 
        main = "Normal Probability Plot for US Cars, log10",
       pch = 21,
       cex = 2,
       col = "black",
       bg = "Blue",
       lwd = 2)
qqline(d2$USCars,lwd = 2)

qqnorm(d2$JapaneseCars,
       main = "Normal Probability Plot for Japan Cars, log10",
       pch = 21,
       cex = 2,
       col = "black",
       bg = "Red",
       lwd = 2)
qqline(d2$JapaneseCars, lwd = 2)

boxplot(d2$USCars,
        d2$JapaneseCars,
        main = "Box Plot of Cars, log10",
        names = c("US","Japan"),
        col = c("Blue","Red"))

LogUS_MPG <- d2$USCars
LogJapan_MPG <-  d2$JapaneseCars
t.test(LogUS_MPG, LogJapan_MPG, var.equal=TRUE)

6 Raw Data

Raw data for MPG of US and Japanese cars.

https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv

MPG of US and Japanese cars
USCars	JapaneseCars
18	24
15	27
18	27
16	25
17	31
15	35
14	24
14	19
14	28
15	23
15	27
14	20
15	22
14	18
22	20
18	31
21	32
21	31
10	32
10	24
11	26
9	29
28	24
25	24
19	33
16	33
17	32
19	28
18	NA
14	NA
14	NA
14	NA
14	NA
12	NA
13	NA

Team 4 - Flipped Assignment 4

IE 5342 - Dr. Timothy I. Matis

Fred Gersdorff, Hunter Swerdloff, Jesus Rosila Mares

12 Sept 2021