data <- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv")
qqnorm(data$USCars,main = "UScars normal plot")
qqline(data$USCars)
qqnorm(data$JapaneseCars, main = "Japancars normal plot")
qqline(data$JapaneseCars)
These plots suggest that both distributions could be normal, however, we
need a normality test to confirm this statement.
boxplot(data$USCars, data$JapaneseCars,
names = c("USA","Japan"),main="Boxplot of US and Japanese cars")
According to this graph the variance of the two groups seems to be
unequal. In other words, the length of the boxes are different.
data$logUSA <- log(data$USCars)
data$logJPN <- log(data$JapaneseCars)
qqnorm(data$logUSA,main = "LogUScars normal plot")
qqline(data$logUSA)
qqnorm(data$logJPN,main = "LogJapancars normal plot")
qqline(data$logJPN)
boxplot(data$logUSA, data$logJPN,
names = c("Log USA","Log Japan"), main="Log boxplot of US and Japanese cars")
After the log transformation, the data still looks fairly normal.
However, now the box plots suggest that the variance of these two new
groups might be constant. Nevertheless, we still need statistical tests
to confirm any of these statements. # - Question 4
Here are the formulantion of the null and alternative hypothesis: \[ H_0:\mu_1=\mu_2 \\ H_1:\mu_1\neq\mu_2 \]
t.test(data$logJPN,data$logUSA,var.equal = TRUE)
##
## Two Sample t-test
##
## data: data$logJPN and data$logUSA
## t = 9.4828, df = 61, p-value = 1.306e-13
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.4182053 0.6417062
## sample estimates:
## mean of x mean of y
## 3.270957 2.741001
As the t-test shows, mean of two groups are 3.27 and 2.74. Also, for the confirmation of the hypothesis, the P-value of 1.306e-13 is extremely low. Therefore, we can conclude that the difference between the mean of the two groups is significant. In other words, the null hypothesis is rejected.
data <- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv")
# -------------------- Question 01
qqnorm(data$USCars)
qqline(data$USCars)
qqnorm(data$JapaneseCars)
qqline(data$JapaneseCars)
# -------------------- Question 02
boxplot(data$USCars, data$JapaneseCars,
names = c("USA","Japan"))
# -------------------- Question 03
data$logUSA <- log(data$USCars)
data$logJPN <- log(data$JapaneseCars)
qqnorm(data$logUSA)
qqline(data$logUSA)
qqnorm(data$logJPN)
qqline(data$logJPN)
boxplot(data$logUSA, data$logJPN,
names = c("USA","Japan"))
# -------------------- Question 04
t.test(data$logJPN,data$logUSA,var.equal = TRUE)