The CSV file was read into R from the GitHub link.
data<- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv")
head(data)
## USCars JapaneseCars
## 1 18 24
## 2 15 27
## 3 18 27
## 4 16 25
## 5 17 31
## 6 15 35
qqnorm(data$USCars, main ="Nomral Probability Plot for US Cars")
qqline(data$USCars, main ="Nomral Probability Plot for US Cars")
qqnorm(data$JapaneseCars, main ="Nomral Probability Plot for Japanese Cars")
qqline(data$JapaneseCars, main ="Nomral Probability Plot for Japanese Cars")
The normal probability plots shows that few of the observation from the sample were far from the Normal Probability Distribution line. Hence, we cannot assume that the data are under Normal distribution.
boxplot(data$USCars,data$JapaneseCars, names=c("US Cars","Japanese Cars"))
Observing the Boxplot of US cars and Japanese cars, the variances are not equal since the box sizes are different. Therefore, the given data is transformed to logarithmic format.
data_T<- log(data)
head(data_T)
## USCars JapaneseCars
## 1 2.890372 3.178054
## 2 2.708050 3.295837
## 3 2.890372 3.295837
## 4 2.772589 3.218876
## 5 2.833213 3.433987
## 6 2.708050 3.555348
qqnorm(data_T$USCars, col="BLUE", main ="Nomral Probability Plot for US Cars (Transformed Data)")
qqline(data_T$USCars,col="BLUE",, main ="Nomral Probability Plot for US Cars (Transformed Data)")
qqnorm(data_T$JapaneseCars,col="BLUE", main ="Nomral Probability Plot for Japanese Cars (Transformed Data)")
qqline(data_T$JapaneseCars,col="BLUE", main ="Nomral Probability Plot for Japanese Cars (Transformed Data)")
Even by transforming the data to logarithm, we observed that some observations are deviated from the normal probability line. Since, it is the weak assumption, later we check for the equivalence in variances.
boxplot(data_T$USCars,data_T$JapaneseCars, main = "Transformed data",
names = c("US Cars","Japanese Cars"), col =c("BLUE","GREEN") )
We can observe from the boxplot that the variances of the logarithmic data between US cars and Japanese cars are equal, as the both box sizes are similar. Thus, we are proceeding to the t-test with logarithmic data.
lets assume the means for mpg of US cars and mpg of Japanese cars be \(\mu_{1}\) and \(\mu_{2}\) respectively. Now considering the Null and Alternative hypothesis are shown below for performing t-test.
t.test(data_T$USCars,data_T$JapaneseCars,alternative = "less", var.equal=TRUE)
##
## Two Sample t-test
##
## data: data_T$USCars and data_T$JapaneseCars
## t = -9.4828, df = 61, p-value = 6.528e-14
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -0.4366143
## sample estimates:
## mean of x mean of y
## 2.741001 3.270957
Sample avg. mpg of US Cars = 2.74
Sample avg. mpg of US Cars = 3.27
The P-value is 6..528e-13, which is much lower than the level of significance of 0.05. This concludes that we reject the null hypothesis. Therefore, the mean mpg of US cars is less than Japanese cars.
data<- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv")
head(data)
#Testing for NPP
qqnorm(data$USCars, main ="Nomral Probability Plot for US Cars")
qqline(data$USCars, main ="Nomral Probability Plot for US Cars")
qqnorm(data$JapaneseCars, main ="Nomral Probability Plot for Japanese Cars")
qqline(data$JapaneseCars, main ="Nomral Probability Plot for Japanese Cars")
#Tesing for variance equality using Box plots
boxplot(data$USCars,data$JapaneseCars, names=c("US Cars","Japanese Cars"))
#transforming data to log(data)
data_T<- log(data)
head(data_T)
#Testing for NPP
qqnorm(data_T$USCars, col="BLUE", main ="Nomral Probability Plot for US Cars (Transformed Data)")
qqline(data_T$USCars,col="BLUE",, main ="Nomral Probability Plot for US Cars (Transformed Data)")
qqnorm(data_T$JapaneseCars,col="BLUE", main ="Nomral Probability Plot for Japanese Cars (Transformed Data)")
qqline(data_T$JapaneseCars,col="BLUE", main ="Nomral Probability Plot for Japanese Cars (Transformed Data)")
#Tesing for variance equality using Box plots
boxplot(data_T$USCars,data_T$JapaneseCars, main = "Transformed data",
names = c("US Cars","Japanese Cars"), col =c("BLUE","GREEN") )
?t.test
t.test(data_T$USCars,data_T$JapaneseCars, alternative ="less", var.equal=TRUE)