Getting Data

data <-  read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv")

Question 1)

data
##    ï..USCars JapaneseCars
## 1         18           24
## 2         15           27
## 3         18           27
## 4         16           25
## 5         17           31
## 6         15           35
## 7         14           24
## 8         14           19
## 9         14           28
## 10        15           23
## 11        15           27
## 12        14           20
## 13        15           22
## 14        14           18
## 15        22           20
## 16        18           31
## 17        21           32
## 18        21           31
## 19        10           32
## 20        10           24
## 21        11           26
## 22         9           29
## 23        28           24
## 24        25           24
## 25        19           33
## 26        16           33
## 27        17           32
## 28        19           28
## 29        18           NA
## 30        14           NA
## 31        14           NA
## 32        14           NA
## 33        14           NA
## 34        12           NA
## 35        13           NA

Answer : As the size of US cars is 35, given in question and Japanese cars is 28 and as both the sample size are not greater than 40, Hence Central limit theorem cannot help us here

Question 2)Checking for normal probibility plot

qqnorm(data$JapaneseCars, col="Red", main="Normal Probability plot for mpg of Japanese cars ")
qqline(data$JapaneseCars)

qqnorm(data$ï..USCars, col="blue",main="Normal Probability plot for mpg of US cars ")
qqline(data$ï..USCars)

Answer - As data points on both the plots are almost falling on a straight line ,we can say that both the distributions appear to be normally distributred

Question 3

boxplot(data$JapaneseCars,data$ï..USCars, names=c("Japanese Cars","US Cars"), main="mpg comparision of Japanese and US Cars")        

Answer : No , Variance of both the sample does not look constant.

Question 4

data2 <- log(data)
#Now cheching normal probability distribution
qqnorm(data2$JapaneseCars,col="Red", main="Normal Probability plot for mpg of log transformed data of Japanese cars ")
qqline(data2$JapaneseCars)

qqnorm(data2$ï..USCars,col="Red", main="Normal Probability plot for mpg of log transformed data of US cars ")
qqline(data2$ï..USCars)

#Now using side by side box plot
boxplot(data2$JapaneseCars,data2$ï..USCars, names=c("Japanese Cars","US Cars"), main="mpg comparision of Japanese and US Cars") 

Answer : After performing log transformation of both sample we can see that the variance of both the samples are now reasonably close. And the probability plot seems fairly normally distributed for both the samples, as the data points on both the plots are almost falling in a straight line. We also saw after log transformation the data points in both NPP came more closer to the straight line

Question 5

Including library

library(dplyr)

Performing Two Sample T Test

t.test(data2$ï..USCars,data2$JapaneseCars,var.equal=TRUE,alternative="less")
## 
##  Two Sample t-test
## 
## data:  data2$ï..USCars and data2$JapaneseCars
## t = -9.4828, df = 61, p-value = 6.528e-14
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##        -Inf -0.4366143
## sample estimates:
## mean of x mean of y 
##  2.741001  3.270957

Answer -

let sample with notation 1 be US and with notation 2 be japanese cars

  • Stating Hypotheses Null Hypotheses : Ho : u1=u2 : u1-u2=0 Alternative Hypotheses : Ha : u1<u2 : u1-u2 < 0
  • Answer a) It can be seen from the test results , sample average for US Cars is 2.74 mpg for log transformed data and for japanese cars is 3.27 mpg for log transformed data
  • Answer b) As we can see the p value is extremly less than 0.05 hence we reject the Null Hypothesis . i.e Mean mpg of cars manufactured in US is less than those manufactured in japan