1 Assignment on Sample T -Test

The CSV file was read into R from the GitHub link.

data<- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv") 
head(data)
##   USCars JapaneseCars
## 1     18           24
## 2     15           27
## 3     18           27
## 4     16           25
## 5     17           31
## 6     15           35

1.1 Check for assumptions prior to t-test

  • Test for Normal Probability of given data
qqnorm(data$USCars, main ="Nomral Probability Plot for US Cars")
qqline(data$USCars, main ="Nomral Probability Plot for US Cars")

qqnorm(data$JapaneseCars, main ="Nomral Probability Plot for Japanese Cars")
qqline(data$JapaneseCars, main ="Nomral Probability Plot for Japanese Cars")

The normal probability plots shows that few of the observation from the sample were far from the Normal Probability Distribution line. Hence, we cannot assume that the data are under Normal distribution.

  • Test for Equal Variance (Using Box-Plots)
boxplot(data$USCars,data$JapaneseCars, names=c("US Cars","Japanese Cars"))

Observing the Boxplot of US cars and Japanese cars, the variances are not equal since the box sizes are different. Therefore, the given data is transformed to logarithmic format.

1.2 Conversion of given data to Log format

data_T<- log(data)
head(data_T)
##     USCars JapaneseCars
## 1 2.890372     3.178054
## 2 2.708050     3.295837
## 3 2.890372     3.295837
## 4 2.772589     3.218876
## 5 2.833213     3.433987
## 6 2.708050     3.555348
  • Test for Normal Probability for Log format data
qqnorm(data_T$USCars, col="BLUE", main ="Nomral Probability Plot for US Cars (Transformed Data)")
qqline(data_T$USCars,col="BLUE",, main ="Nomral Probability Plot for US Cars (Transformed Data)")

qqnorm(data_T$JapaneseCars,col="BLUE", main ="Nomral Probability Plot for Japanese Cars (Transformed Data)")
qqline(data_T$JapaneseCars,col="BLUE", main ="Nomral Probability Plot for Japanese Cars (Transformed Data)")

Even by transforming the data to logarithm, we observed that some observations are deviated from the normal probability line. Since, it is the weak assumption, later we check for the equivalence in variances.

  • Test for Variance Equality for Log format data
boxplot(data_T$USCars,data_T$JapaneseCars, main = "Transformed data", 
        names = c("US Cars","Japanese Cars"), col =c("BLUE","GREEN") )

We can observe from the boxplot that the variances of the logarithmic data between US cars and Japanese cars are equal, as the both box sizes are similar. Thus, we are proceeding to the t-test with logarithmic data.

1.3 Performing The t-test with following Hypothesis

lets assume the means for mpg of US cars and mpg of Japanese cars be \(\mu_{1}\) and \(\mu_{2}\) respectively. Now considering the Null and Alternative hypothesis are shown below for performing t-test.

  • \(H_{0}: \mu_{1} =\mu_{2}\)
  • \(H_{a}: \mu_{1} < \mu_{2}\)
t.test(data_T$USCars,data_T$JapaneseCars,alternative = "less", var.equal=TRUE) 
## 
##  Two Sample t-test
## 
## data:  data_T$USCars and data_T$JapaneseCars
## t = -9.4828, df = 61, p-value = 6.528e-14
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##        -Inf -0.4366143
## sample estimates:
## mean of x mean of y 
##  2.741001  3.270957

1.4 Conclusions derived from the performed t-test

  • Sample avg. mpg of US Cars = 2.74

  • Sample avg. mpg of US Cars = 3.27

The P-value is 6..528e-13, which is much lower than the level of significance of 0.05. This concludes that we reject the null hypothesis. Therefore, the mean mpg of US cars is less than Japanese cars.

2 Complete R Code

data<- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv") 
head(data)

#Testing for NPP
qqnorm(data$USCars, main ="Nomral Probability Plot for US Cars")
qqline(data$USCars, main ="Nomral Probability Plot for US Cars")
qqnorm(data$JapaneseCars, main ="Nomral Probability Plot for Japanese Cars")
qqline(data$JapaneseCars, main ="Nomral Probability Plot for Japanese Cars")

#Tesing for variance equality using Box plots
boxplot(data$USCars,data$JapaneseCars, names=c("US Cars","Japanese Cars"))

#transforming data to log(data)
data_T<- log(data)
head(data_T)
#Testing for NPP
qqnorm(data_T$USCars, col="BLUE", main ="Nomral Probability Plot for US Cars (Transformed Data)")
qqline(data_T$USCars,col="BLUE",, main ="Nomral Probability Plot for US Cars (Transformed Data)")
qqnorm(data_T$JapaneseCars,col="BLUE", main ="Nomral Probability Plot for Japanese Cars (Transformed Data)")
qqline(data_T$JapaneseCars,col="BLUE", main ="Nomral Probability Plot for Japanese Cars (Transformed Data)")

#Tesing for variance equality using Box plots
boxplot(data_T$USCars,data_T$JapaneseCars, main = "Transformed data", 
        names = c("US Cars","Japanese Cars"), col =c("BLUE","GREEN") )

?t.test
t.test(data_T$USCars,data_T$JapaneseCars, alternative ="less", var.equal=TRUE)