1 Reading data

  data <- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv")

2 Question 1

  qqnorm(data$USCars,main = "UScars normal plot")
  qqline(data$USCars)

  qqnorm(data$JapaneseCars, main = "Japancars normal plot")
  qqline(data$JapaneseCars)

These plots suggest that both distributions could be normal, however, we need a normality test to confirm this statement.

3 Question 2

boxplot(data$USCars, data$JapaneseCars,
        names = c("USA","Japan"),main="Boxplot of US and Japanese cars")

According to this graph the variance of the two groups seems to be unequal. In other words, the length of the boxes are different.

4 Question 3

data$logUSA <- log(data$USCars)
data$logJPN <- log(data$JapaneseCars)

qqnorm(data$logUSA,main = "LogUScars normal plot")
qqline(data$logUSA)

qqnorm(data$logJPN,main = "LogJapancars normal plot")
qqline(data$logJPN)

boxplot(data$logUSA, data$logJPN,
        names = c("Log USA","Log Japan"), main="Log boxplot of US and Japanese cars")

After the log transformation, the data still looks fairly normal. However, now the box plots suggest that the variance of these two new groups might be constant. Nevertheless, we still need statistical tests to confirm any of these statements. # - Question 4

Here are the formulantion of the null and alternative hypothesis: \[ H_0:\mu_1=\mu_2 \\ H_1:\mu_1\neq\mu_2 \]

t.test(data$logJPN,data$logUSA,var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  data$logJPN and data$logUSA
## t = 9.4828, df = 61, p-value = 1.306e-13
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.4182053 0.6417062
## sample estimates:
## mean of x mean of y 
##  3.270957  2.741001

As the t-test shows, mean of two groups are 3.27 and 2.74. Also, for the confirmation of the hypothesis, the P-value of 1.306e-13 is extremely low. Therefore, we can conclude that the difference between the mean of the two groups is significant. In other words, the null hypothesis is rejected.

5 R code

data <- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv")

# -------------------- Question 01

qqnorm(data$USCars)
qqline(data$USCars)

qqnorm(data$JapaneseCars)
qqline(data$JapaneseCars)

# -------------------- Question 02

boxplot(data$USCars, data$JapaneseCars,
        names = c("USA","Japan"))

# -------------------- Question 03

data$logUSA <- log(data$USCars)
data$logJPN <- log(data$JapaneseCars)

qqnorm(data$logUSA)
qqline(data$logUSA)

qqnorm(data$logJPN)
qqline(data$logJPN)

boxplot(data$logUSA, data$logJPN,
        names = c("USA","Japan"))

# -------------------- Question 04
t.test(data$logJPN,data$logUSA,var.equal = TRUE)