1 Reading Data

dat <- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv")

dat_us <- dat$USCars
dat_jap <- dat$JapaneseCars

2 Question 1

According to the NPP graphics, the samples do not look like normal distributed. The scatter points do not seem to follow a linear trend for both american and japenese cars mpg. However, it seems to be fairly normal distributed to assume normality and perform a t-test. It might be useful to apply a log transformation.

#Males
qqnorm(dat_us,main="NPP for US cars")
qqline(dat_us)

qqnorm(dat_jap ,main="NPP for Japanese cars")
qqline(dat_jap)

3 Question 2

The boxplots shows a visual higher variance for the japanese mpg compared to american cars. However, we do not have enough evidence whether the variances are constant or not.

boxplot(dat_us,dat_jap,col=c("blue","red"),names=c("American Cars","Japanese Cars"),main="Boxplot Before log transformation")

4 Question 3

After applying the log transformation, the points distribution seemed to change just a little compared to before log transformation. Still, the data looks to be fairly normal distributed. Conversely, the variance visually changed and seems to be more similar for american and japanese mpg after the transformed data. We may assume that the variances are constant to perform a t-test.

dat_log_us <- log(dat_us)
dat_log_jap <- log(dat_jap)

qqnorm(dat_log_us,main="NPP for US cars")
qqline(dat_log_us)

qqnorm(dat_log_jap ,main="NPP for Japanese cars")
qqline(dat_log_jap)

boxplot(dat_log_us,dat_log_jap,col=c("blue","red"),names=c("American Cars","Japanese Cars"),main="Boxplot after log transformation")

5 Question 4

We want to test whether the mean mpg of american cars are the less than japanese cars. The hypotheses test takes the form:

\[ H_0:\mu_{US}=\mu_{Jap} \\ H_1:\mu_{US} \lt \mu_{Jap}\]

t.test(dat_log_us,dat_log_jap,conf.level=0.95,var.equal=TRUE)
## 
##  Two Sample t-test
## 
## data:  dat_log_us and dat_log_jap
## t = -9.4828, df = 61, p-value = 1.306e-13
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.6417062 -0.4182053
## sample estimates:
## mean of x mean of y 
##  2.741001  3.270957

The sample averages after applying the log transformation for american and japanese cars are respectively 2.74 and 3.27 which seems to be fairly different. This is supported by the t-test where we rejected the null hypotheses with a confidence level of 0.95 and a p-value <<0.0001 (6.528e-14).