df<-read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv")
Forming data frames for USCars and Japanese Cars:
USCars<-df[,1]
JapaneseCars<-df[1:28,2]
Checking the Sample Size in each population:
length(USCars)
## [1] 35
length(JapaneseCars)
## [1] 28
Comment: If the sample sizes are large enough we can use the central limit theorem to justify approximate normality.
Here the first population has 35 samples and second population has only 28 samples.
qqnorm(USCars, main = "Normal Q-Q Plot of US Cars")
qqline(USCars)
qqnorm(JapaneseCars,main = "Normal Q-Q Plot of Japanese Cars")
qqline(JapaneseCars)
Answer # 1
The normal plot of USCars shows deviation from normality i.e. the data points does not follow straight line whereas, the normal plot of Japanese cars does show normality by following straight line.
Using Side by Side Box-Plots:
boxplot(USCars, JapaneseCars, main ="US Cars vs. Japanese Cars Box Plots" ,names=c("US Cars","Japanese Cars"))
Answer # 2
Huge difference in maginute of variance between USCars and Japanese cars as the boxes vary in length.
Therefore we use log transformation to approximate equality of the two variances before using two sample t-test.
USCars <- log(USCars)
JapaneseCars <- log(JapaneseCars)
For USCars:
qqnorm(USCars, main = "Normal Q-Q Plot of US Cars")
qqline(USCars)
For Japanese Cars:
qqnorm(JapaneseCars,main = "Normal Q-Q Plot of Japanese Cars")
qqline(JapaneseCars)
Answer # 3
Comment on Log-Transformed NPP’s:
The data better-follows a straight line than it did before the transformation.
boxplot(USCars, JapaneseCars, main ="US Cars vs. Japanese Cars Box Plots" ,names=c("US Cars","Japanese Cars"))
Answer # 3
Comment on Log-Transformed Side by Side Box Plots:
The interquartile range for both log transformed USCars & Japanese Cars is approximately 0.2. i.e. the log transformation has significantly reduced the difference in the variance between the two as we can see from the size of the male and female box plots.
t.test(USCars,JapaneseCars,var.equal=TRUE,alternative="less")
##
## Two Sample t-test
##
## data: USCars and JapaneseCars
## t = -9.4828, df = 61, p-value = 6.528e-14
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -0.4366143
## sample estimates:
## mean of x mean of y
## 2.741001 3.270957
Data:
The null and alternative hypotheses are tested at the 0.05 significance level:
H0: μ1 = μ2
Ha: μ1 < μ2
where μ1 = Log10 of US MPG and μ2 = Log10 of Japan MPG
Answer # 4 a:
Sample Avg for log mpg of USCars= 2.74
Sample Avg for log mpg of Japanese Cars= 3.27
Answer # 4 b:
Result of the T-test gave us a p-value = 6.528e-14 which is far less than the level of significance (0.05).
Therefore, we will reject H0 and deduce that the alternative hypothesis that the mean mpg of cars manufactured in the US is less than that of those manufactured in Japan.
getwd()
#Reading the data
df<-read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv")
#Forming data frames for USCars and Japanese Cars
USCars<-df[,1]
JapaneseCars<-df[1:28,2]
#Checking Sample Size in each population
length(USCars)
length(JapaneseCars)
#Comment: If the sample sizes are large enough we can use the central limit theorem to justify approximate normality.
#Here the first population has 35 samples and second population has only 28 samples.
#Check for normality
qqnorm(USCars, main = "Normal Q-Q Plot of US Cars")
qqline(USCars)
qqnorm(JapaneseCars,main = "Normal Q-Q Plot of Japanese Cars")
qqline(JapaneseCars)
#Answer 1: The normal plot of USCars shows deviation from normality i.e. the data points does not follow straight line
#whereas, the normal plot of Japanese cars does show normality by following straight line.
#Comparing Variance:
boxplot(USCars, JapaneseCars, main ="US Cars vs. Japanese Cars Box Plots" ,names=c("US Cars","Japanese Cars"))
#Answer 2: Huge difference in maginute of variance between USCars and Japanese cars as the boxes vary in length.
#Therefore we use log transformation to approximate equality of the two variances before using two sample t-test
#Performing Log Transformation:
USCars <- log(USCars)
JapaneseCars <- log(JapaneseCars)
#Normal plots for log transformed data
qqnorm(USCars, main = "Normal Q-Q Plot of US Cars")
qqline(USCars)
qqnorm(JapaneseCars,main = "Normal Q-Q Plot of Japanese Cars")
qqline(JapaneseCars)
#Answer 3:The data better-follows a straight line than it did before the transformation.
#Box plot for log transformed data
boxplot(USCars, JapaneseCars, main ="US Cars vs. Japanese Cars Box Plots" ,names=c("US Cars","Japanese Cars"))
#Answer 3: The interquartile range for both log transformed USCars & Japanese Cars is approximately 0.2.
# i.e. the log transformation has significantly reduced the difference in the variance between the two.
#Now Performing T-Test
?t.test
t.test(USCars,JapaneseCars,var.equal=TRUE,alternative="less")