Reading data
dat <- as.data.frame(read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv"))
datUS <- dat[1:35,1]
str(datUS)
## int [1:35] 18 15 18 16 17 15 14 14 14 15 ...
datJAP <- dat[1:28,2]
Q1 Ans: According to textbook, both sample sizes should be more than 40 to use CLT, it is not large enough to assume central limit theorem holds. But generally speaking, since the number is close to 40, it could be used.
————————————————————————————————
Normal distribution plots and box plots
qqnorm(datUS,main="Nomral Q-Q plot of The US cars (mpg) ")
qqline(datUS)

qqnorm(datJAP, main="Nomral Q-Q plot of Japanese cars (mpg) ")
qqline(datJAP)

boxplot(datUS,datJAP, main= "Box plot of The US/Japanese cars (mpg)", ylab= "mpg",names=c("The US","Japanese"))

Q2 Ans: Overall, both dataset of The US cars and Japanese cars generally follow the normal distribution, yet both of them have some skewnesses at high theoritical quantiles.
Q4 Ans: In terms of normal probability plots, we observed a little departure from normality after transformation but still the data before and after is close to normality to some extent and passes the fat pencil test.
T-test
t.test(logdatUS,logdatJAP,var.equal=TRUE, alternative = "less")
##
## Two Sample t-test
##
## data: logdatUS and logdatJAP
## t = -9.4828, df = 61, p-value = 6.528e-14
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -0.4366143
## sample estimates:
## mean of x mean of y
## 2.741001 3.270957
Q5 Ans: sample averages for the log of the mpg of US: 2.741001.
sample averages for the log of the mpg of US: 3.270957.
since P value is 6.528e-14, which is less than 0.05. We will reject the null hypothesis which is means are equal.
Conclusion is drawn that mean mpg of cars manufactured in the US is less than that of those manufactured in Japan.