This report captures work done for the 4th flipped assignment, comparing mean mpg rates for cars manufactured in the US vs. Japan. R code along with the results are provided. The approach taken was to mimic the approach requested in the assignments for 2 sample t-tests:
# setup Libraries
library(knitr)
library(dplyr)
library(lawstat)
library(printr)
# read in file
d <- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv",
sep=",",
na.strings = c("","."))
# clean up first col name
names(d)[1] <- "USCars"
Does the mpg of both US cars and Japanese cars appear to be Normally distributed (use NPPs)?
The following plots show how well the cars manufactured in the US and Japan fit a normal distribution. Dots closer to the line indicate a value fitting closer to the normal curve.
qqnorm(d$USCars,
main = "Normal Probability Plot for US Cars",
pch = 21,
cex = 2,
col = "black",
bg = "Blue",
lwd = 2)
qqline(d$USCars, lwd = 2)
qqnorm(d$JapaneseCars,
main = "Normal Probability Plot for Japan Cars",
pch = 21,
cex = 2,
col = "black",
bg = "Red",
lwd = 2)
qqline(d$JapaneseCars, lwd = 2)
As one can see, the mpg for US cars doesn’t appear to be Normally distributed, while the mpg for Japanese cars does appear to be Normally distributed.
Does the variance appear to be constant (use side-by-side boxplots)?
The following side-by-side boxplot compares the variances of cars manufactured in the US and Japan. The “taller” the box, the greater the inter-quartile range and thus, the larger the variance.
boxplot(d$USCars,
d$JapaneseCars,
main = "Box Plot of Cars",
names = c("US","Japan"),
col = c("Blue","Red"))
The interquartile range, a measure of variance, for the US MPG is 4, while the interquartile range for Japanese MPG is 7. With the Japanese interquartile range being 75% larger than the US range, the variances do not visually appear to be constant between the two countries. It should be noted that a statistical test such as Levene’s test is recommended to draw a more accurate conclusion from the data.
After transforming the data using a log transformation, what differences are seen in the plots?
d2 <- log10(d)
Now we can re-run the original normality plots and box plots using the transformed data.
qqnorm(d2$USCars,
main = "Normal Probability Plot for US Cars, log10",
pch = 21,
cex = 2,
col = "black",
bg = "Blue",
lwd = 2)
qqline(d2$USCars,lwd = 2)
qqnorm(d2$JapaneseCars,
main = "Normal Probability Plot for Japan Cars, log10",
pch = 21,
cex = 2,
col = "black",
bg = "Red",
lwd = 2)
qqline(d2$JapaneseCars, lwd = 2)
boxplot(d2$USCars,
d2$JapaneseCars,
main = "Box Plot of Cars, log10",
names = c("US","Japan"),
col = c("Blue","Red"))
The interquartile range, for the US MPG after the Log transformation is \(\approx\).109, while the interquartile range for Japanese MPG after the Log transformation is \(\approx\).111. The transformation has reduced the difference in variances significantly and we can now consider the variances to be consistent with each other.
What is the null and alternative hypothesis and test using a 0.05 level of significance.
a. What are the sample averages for the log of the mpg of US and Japanese cars?
b. What are your conclusions?
The null and alternative hypotheses are tested at the 0.05 significance level:
H0:μ1 = μ2
Ha:μ1 ≠ μ2
where μ1 = Log10 of US MPG and μ2 = Log10 of Japan MPG
The code and results are provided below.
LogUS_MPG <- d2$USCars
LogJapan_MPG <- d2$JapaneseCars
t.test(LogUS_MPG, LogJapan_MPG, var.equal=TRUE)
##
## Two Sample t-test
##
## data: LogUS_MPG and LogJapan_MPG
## t = -9.4828, df = 61, p-value = 1.306e-13
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.2786895 -0.1816243
## sample estimates:
## mean of x mean of y
## 1.190402 1.420559
Thus, the output above shows that the p-value \(\approx\) 0, which is below our threshold of 0.05. We reject the Null hypothesis and conclude that there is a statistically significant difference in the mean MPG between cars manufactured in the US vs. Japan.
# setup Libraries
library(knitr)
library(dplyr)
library(lawstat)
library(printr)
# read in file
d <- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv",
sep=",",
na.strings = c("","."))
# clean up first col name
names(d)[1] <- "USCars"
qqnorm(d$USCars,
main = "Normal Probability Plot for US Cars",
pch = 21,
cex = 2,
col = "black",
bg = "Blue",
lwd = 2)
qqline(d$USCars, lwd = 2)
qqnorm(d$JapaneseCars,
main = "Normal Probability Plot for Japan Cars",
pch = 21,
cex = 2,
col = "black",
bg = "Red",
lwd = 2)
qqline(d$JapaneseCars, lwd = 2)
boxplot(d$USCars,
d$JapaneseCars,
main = "Box Plot of Cars",
names = c("US","Japan"),
col = c("Blue","Red"))
d2 <- log10(d)
qqnorm(d2$USCars,
main = "Normal Probability Plot for US Cars, log10",
pch = 21,
cex = 2,
col = "black",
bg = "Blue",
lwd = 2)
qqline(d2$USCars,lwd = 2)
qqnorm(d2$JapaneseCars,
main = "Normal Probability Plot for Japan Cars, log10",
pch = 21,
cex = 2,
col = "black",
bg = "Red",
lwd = 2)
qqline(d2$JapaneseCars, lwd = 2)
boxplot(d2$USCars,
d2$JapaneseCars,
main = "Box Plot of Cars, log10",
names = c("US","Japan"),
col = c("Blue","Red"))
LogUS_MPG <- d2$USCars
LogJapan_MPG <- d2$JapaneseCars
t.test(LogUS_MPG, LogJapan_MPG, var.equal=TRUE)
Raw data for MPG of US and Japanese cars.
https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv
| USCars | JapaneseCars |
|---|---|
| 18 | 24 |
| 15 | 27 |
| 18 | 27 |
| 16 | 25 |
| 17 | 31 |
| 15 | 35 |
| 14 | 24 |
| 14 | 19 |
| 14 | 28 |
| 15 | 23 |
| 15 | 27 |
| 14 | 20 |
| 15 | 22 |
| 14 | 18 |
| 22 | 20 |
| 18 | 31 |
| 21 | 32 |
| 21 | 31 |
| 10 | 32 |
| 10 | 24 |
| 11 | 26 |
| 9 | 29 |
| 28 | 24 |
| 25 | 24 |
| 19 | 33 |
| 16 | 33 |
| 17 | 32 |
| 19 | 28 |
| 18 | NA |
| 14 | NA |
| 14 | NA |
| 14 | NA |
| 14 | NA |
| 12 | NA |
| 13 | NA |