Reading data

dat  <- as.data.frame(read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv"))

datUS <- dat[1:35,1]
str(datUS)

##  int [1:35] 18 15 18 16 17 15 14 14 14 15 ...

datJAP <- dat[1:28,2]

Q1 Ans: According to textbook, both sample sizes should be more than 40 to use CLT, it is not large enough to assume central limit theorem holds. But generally speaking, since the number is close to 40, it could be used.

————————————————————————————————

Normal distribution plots and box plots

qqnorm(datUS,main="Nomral Q-Q plot of The US cars (mpg) ")
qqline(datUS)

qqnorm(datJAP, main="Nomral Q-Q plot of Japanese cars (mpg) ")
qqline(datJAP)

boxplot(datUS,datJAP, main= "Box plot of The US/Japanese cars (mpg)", ylab= "mpg",names=c("The US","Japanese"))

Q2 Ans: Overall, both dataset of The US cars and Japanese cars generally follow the normal distribution, yet both of them have some skewnesses at high theoritical quantiles.

Q3 Ans: The variance does not appear to be the same as shown in the box plot. The median mpg of US cars is less than that of Japanese cars.

————————————————————————————————

Transformed data

logdatUS <- log(datUS)
logdatJAP <- log(datJAP)

qqnorm(logdatUS,main="Nomral Q-Q plot of transformed data of The US cars (mpg) ")
qqline(logdatUS)

qqnorm(logdatJAP, main="Nomral Q-Q plot of transformed data of Japanese cars (mpg) ")
qqline(logdatJAP)

boxplot(logdatUS,logdatJAP, main= "Box plot of transformed data of The US/Japanese cars (mpg)", ylab= "mpg",names=c("The US","Japanese"))

Q4 Ans: In terms of normal probability plots, we observed a little departure from normality after transformation but still the data before and after is close to normality to some extent and passes the fat pencil test.

Regarding to Boxplot, the data after log transformation possesses less difference in variance of both datasets compared to that before log transformation. Now, it meets the requirement of using 2 sample T-test with pooled variance.

————————————————————————————————

T-test

t.test(logdatUS,logdatJAP,var.equal=TRUE, alternative = "less")

## 
##  Two Sample t-test
## 
## data:  logdatUS and logdatJAP
## t = -9.4828, df = 61, p-value = 6.528e-14
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##        -Inf -0.4366143
## sample estimates:
## mean of x mean of y 
##  2.741001  3.270957

Assignment4_Group4

Eddie J. Liu

2021/9/7

Reading data

Q1 Ans: According to textbook, both sample sizes should be more than 40 to use CLT, it is not large enough to assume central limit theorem holds. But generally speaking, since the number is close to 40, it could be used.

————————————————————————————————

Normal distribution plots and box plots

Q2 Ans: Overall, both dataset of The US cars and Japanese cars generally follow the normal distribution, yet both of them have some skewnesses at high theoritical quantiles.

Q3 Ans: The variance does not appear to be the same as shown in the box plot. The median mpg of US cars is less than that of Japanese cars.

————————————————————————————————

Transformed data

Q4 Ans: In terms of normal probability plots, we observed a little departure from normality after transformation but still the data before and after is close to normality to some extent and passes the fat pencil test.

Regarding to Boxplot, the data after log transformation possesses less difference in variance of both datasets compared to that before log transformation. Now, it meets the requirement of using 2 sample T-test with pooled variance.

————————————————————————————————

T-test

Q5 Ans: sample averages for the log of the mpg of US: 2.741001.

sample averages for the log of the mpg of US: 3.270957.

since P value is 6.528e-14, which is less than 0.05. We will reject the null hypothesis which is means are equal.

Conclusion is drawn that mean mpg of cars manufactured in the US is less than that of those manufactured in Japan.