Answer to Question no.1

Reading Table from the URL:

cars <- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv")
colnames(cars) <- c("USCars", "JapaneseCars")
cars
##    USCars JapaneseCars
## 1      18           24
## 2      15           27
## 3      18           27
## 4      16           25
## 5      17           31
## 6      15           35
## 7      14           24
## 8      14           19
## 9      14           28
## 10     15           23
## 11     15           27
## 12     14           20
## 13     15           22
## 14     14           18
## 15     22           20
## 16     18           31
## 17     21           32
## 18     21           31
## 19     10           32
## 20     10           24
## 21     11           26
## 22      9           29
## 23     28           24
## 24     25           24
## 25     19           33
## 26     16           33
## 27     17           32
## 28     19           28
## 29     18           NA
## 30     14           NA
## 31     14           NA
## 32     14           NA
## 33     14           NA
## 34     12           NA
## 35     13           NA
length(cars$JapaneseCars)
## [1] 35
JP <- cars[c(1:28), 2]
length(JP)
## [1] 28
qqnorm(cars$USCars, col = "red", main =  "UScars NPP", ylab = "mpg")
qqline(cars$USCars)

qqnorm(JP, col = "blue", main =  "Japanese cars  NPP", ylab = "mpg")
qqline(JP)

Comment on US car normal probability plot: We observe that most of the data points fall in straight line, hence, the MPG of the US cars appear to be normally distributed.

Comment on the Japanese car normal probability plot: Like the US cars, the MPG data points of the Japanese cars appear to fall within a line and thus appear to be normally distributed.

Answer to Question no.2

boxplot(cars$USCars, JP, main = "Variance Equality Check", names = c("UScars", "Japanese Cars"), ylab = "Mpg")

From looking at the box plot we observe that the Interquartile range (IQR) or height of the boxes are not the same. Hence there is a difference in the variance of the two samples. So variance does not appear to be constant. We can also see two outliers in the US car sample.

Answer to Question no.3

logcars <- log(cars)
qqnorm(logcars$USCars, col = "red", main = "US Cars NPP after LT", ylab = "Log MPG")
qqline(logcars$USCars)

qqnorm(logcars$JapaneseCars, col = "blue", main = "JP Cars NPP after LT", ylab = " Log MPG")
qqline(logcars$JapaneseCars)

boxplot(logcars$USCars, logcars$JapaneseCars,  main = "Variance Equality Check", names = c("UScars", "Japanese Cars"), ylab = "Log of Mpg")

After log transformation, we observe that the Interquartile range (IQR) or height of the boxes appear the same. Hence there is visually no difference in the variance of the two samples. So variance appears to be constant.

Comment on the before/after log-transform box plots:

From the two different box plots (cars and log of cars), we observe that after log transformation, the variance of the two groups become equal. Furthermore, in the original boxplots show two outliers in the same side of the US cars (upper side). After log transformation, both the outliers are on two different sides of the plots (high and low sides).

Also, the normal probability plots for both the original data and the log transformed data show normaility. For t-test, the normality is a weak assumption. Hence slight deviation from the normality would also give fairly accurate results. But equality of variance is a strong assumption and it is true after the log transformation. Hence, we use Log-Transformed data for our t-test which holds our strong assumption of equal variance.

Answer to Question no.4

u1 = mean of US cars, u2 = mean of Japanese cars

Null Hypothesis: H0 : u1 = u2

Alternative Hypothesis: Ha: u1 < u2

t.test(logcars$USCars, logcars$JapaneseCars, alternative = "less",  var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  logcars$USCars and logcars$JapaneseCars
## t = -9.4828, df = 61, p-value = 6.528e-14
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##        -Inf -0.4366143
## sample estimates:
## mean of x mean of y 
##  2.741001  3.270957
summary(logcars)
##      USCars       JapaneseCars  
##  Min.   :2.197   Min.   :2.890  
##  1st Qu.:2.639   1st Qu.:3.178  
##  Median :2.708   Median :3.296  
##  Mean   :2.741   Mean   :3.271  
##  3rd Qu.:2.890   3rd Qu.:3.434  
##  Max.   :3.332   Max.   :3.555  
##                  NA's   :7

Answer 4a

The sample average of the Japanese and US car MPG are 3.271 and 2.741 respectively.

Answer 4b Since P-value < 0.05, so we reject the Null hypothesis (Ho). Hence the conclusion is mean number of the log of the MPG f the US car is significantly different (less than) the Japanese cars at a 0.05 level of significance. So the hypothesis formulated by the environmental group is true (Us car mpg < Japanese car mpg.

Complete Code

cars <- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv")
colnames(cars) <- c("USCars", "JapaneseCars")
cars
length(cars$JapaneseCars)
JP <- cars[c(1:28), 2]
length(JP)
#answer to qurstion no.1
qqnorm(cars$USCars, col = "red", main =  "UScars NPP", ylab = "mpg")
qqline(cars$USCars)
qqnorm(JP, col = "blue", main =  "Japanese cars  NPP", ylab = "mpg")
qqline(JP)

#answer to qurstion no.2
boxplot(cars$USCars, JP, main = "Variance Equality Check", names = c("UScars", "Japanese Cars"), ylab = "Mpg")

#answer to question no.3
logcars <- log(cars)
qqnorm(logcars$USCars, col = "red", main = "US Cars NPP after LT", ylab = "Log MPG")
qqline(logcars$USCars)
qqnorm(logcars$JapaneseCars, col = "blue", main = "JP Cars NPP after LT", ylab = "Log MPG")
qqline(logcars$JapaneseCars)
boxplot(logcars$USCars, logcars$JapaneseCars)

#Answer to question no.4
?t.test
t.test(logcars$USCars, logcars$JapaneseCars, alternative = "less",  var.equal = TRUE)