Reading the data
df<-read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv")
ucars<-df$USCars
jcars<-df$JapaneseCars
Check for normality
qqnorm(ucars,main = "Normal Q-Q plot for US cars")
qqline(ucars)
qqnorm(jcars,main = "Normal Q-Q plot for Japanese cars")
qqline(jcars)
Since the data points for both US cars and Japanese cars lie approximately in a straight line in their normal Q-Q plots,we can assume that they are approximately normally distributed.
boxplot(ucars,jcars,names = c("US Cars","Japanese Cars"), main="boxplot of US Cars and Japanese Cars")
log_ucars <- log(ucars)
log_jcars <- log(jcars)
boxplot(log_ucars,log_jcars,names = c("log US Cars","log Japanese Cars"), main="boxplot of logs US Cars and Japanese Cars data points")
After log transformation: Since the interquartile range looks similar for log of the US Cars and Japanese Cars ( IQR of log of Japanese Cars ~ log of IQR of US Cars),so we can conclude that the variance of these transformed population samples are approximately constant.
#Performing Two-sample T-test
a <-0.05
dof <-35+28-2
t_critical <-qt(1-a/2,dof)
t_critical
## [1] 1.999624
Comment: The cut off value for t-critical is plus or minus 2.
?t.test
t.test(log_ucars,log_jcars,var.equal = TRUE,alternative="less")
##
## Two Sample t-test
##
## data: log_ucars and log_jcars
## t = -9.4828, df = 61, p-value = 6.528e-14
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -0.4366143
## sample estimates:
## mean of x mean of y
## 2.741001 3.270957
a.Sample average for log of mpg of cars manufactured in the US = 2.74 Sample average for log of mpg of cars manufactured in the US = 3.27
b.Since p value is smaller than 0.05 and our t-test is outside the t-critical range. On this basis,we reject the null hypothesis.
On conclusion, the mean of mpg of cars manufactured in the US is not equal to that of those manufactured in Japan.The means are significantly different.
#Reading the data
df<-read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv")
ucars<-df$USCars
jcars<-df$JapaneseCars
#Check for normality
qqnorm(ucars,main = "Normal Q-Q plot for US cars")
qqline(ucars)
qqnorm(jcars,main = "Normal Q-Q plot for Japanese cars")
qqline(jcars)
#Check for constant variance
boxplot(ucars,jcars,names = c("US Cars","Japanese Cars"), main="boxplot of US Cars and Japanese Cars")
#Log Transformation
log_ucars <- log(ucars)
log_jcars <- log(jcars)
boxplot(log_ucars,log_jcars,names = c("log US Cars","log Japanese Cars"), main="boxplot of logs US Cars and Japanese Cars data points")
#T-test
a <-0.05
dof <-35+28-2
t_critical <-qt(1-a/2,dof)
t_critical
?t.test
t.test(log_ucars,log_jcars,var.equal = TRUE,alternative="less")
Comment 2:
Since the interquartile range differs for US Cars and Japanese Cars ( IQR of Japanese Cars > IQR of US Cars),so we can conclude that the variance is not constant.