df<-read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv")
df$USCars
## [1] 18 15 18 16 17 15 14 14 14 15 15 14 15 14 22 18 21 21 10 10 11 9 28 25 19
## [26] 16 17 19 18 14 14 14 14 12 13
df$JapaneseCars
## [1] 24 27 27 25 31 35 24 19 28 23 27 20 22 18 20 31 32 31 32 24 26 29 24 24 33
## [26] 33 32 28 NA NA NA NA NA NA NA
qqnorm(df$USCars,main="Normal Q-Q Plot US Cars")
qqnorm(df$JapaneseCars,main="Normal Q-Q Plot Japanese Cars")
Comparing the two Q–Q plots, both samples roughly follow a straight line through the middle, which means a normal model isn’t a terrible first pass for either group. The differences show up in the tails and in how tightly the points hug the line. The US cars bend away from the line more: there’s a flat cluster around 14–16 mpg and a stronger dip in the lower tail, then a few high-mpg points that rise above the line. That pattern suggests more deviation from normality and a bit more skew/heavy tails. The Japanese cars stay closer to the line across most quantiles, with only a mild upward bend in the upper tail, so they look closer to normal overall. Also, the Japanese sample spans higher mpg values (roughly high teens to mid-30s) than the US sample (roughly single digits to high-20s), hinting at a higher center and wider top tail for the Japanese cars.
boxplot(df$USCars,df$JapaneseCars,names=c("US Cars", "Japanese Cars"),main="Box Plots US vs. Japanese Cars")
In this box plot, the height of each box shows the interquartile range (IQR). The Japanese cars have a taller box than the US cars, which means their mpg values vary more. The thick line inside each box is the median: it sits around 27 mpg for Japanese cars and around 15 mpg for US cars, so the typical Japanese car gets much better mileage. A standard box plot doesn’t draw the mean, but because both groups look a bit right-skewed (longer upper whiskers and a few high points), the mean is likely a little above the median in each group. Even with that, the Japanese mean will still be much higher than the US mean given the clear separation between the boxes.
logUS<-log(df$USCars)
logJap<-log(df[1:28,2])
boxplot(logUS,logJap,names=c("US Cars", "Japanese Cars"),main="Box Plots Log US vs. Japanese Cars")
On the log scale the two box plots look much more alike in shape. The boxes (IQRs) are about the same height, and the whiskers are of similar length, so the overall spread for US and Japanese cars looks comparable. Both boxes sit fairly symmetrically with only a small number of points outside the whiskers. The main visual difference is where the boxes sit on the vertical axis: the Japanese box is shifted higher than the US box, so its median line is clearly above the US median. In short, variability looks similar, but the center for Japanese cars is noticeably higher.
mean(logUS)
## [1] 2.741001
mean(logJap)
## [1] 3.270957
The Q–Q plots for the raw mpg of US and Japanese cars each fall roughly along a straight line, with only mild curvature at the tails, so treating the data as approximately normal is reasonable. After the log transform the distributions also look more symmetric. The side-by-side box plots of the log(mpg) show boxes and whiskers of similar height for the two groups, so the equal-variance assumption is also reasonable.
t.test(logUS,logJap,alternative = c("less"),mu = 0, paired = FALSE, var.equal = TRUE,conf.level = 0.95)
##
## Two Sample t-test
##
## data: logUS and logJap
## t = -9.4828, df = 61, p-value = 6.528e-14
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -0.4366143
## sample estimates:
## mean of x mean of y
## 2.741001 3.270957
Let μ1 be the population mean log(mpg) for US cars and μ2 for Japanese cars. We test
\(H_{0}=:\mu_{1}-\mu_{2}=0\)
versus
\(H_{a}=:\mu_{1}-\mu_{2}<0\)
at α = 0.05, using a two-sample t-test with pooled variance on the
log(mpg) values (equal variances justified above). From your output the
US mean log(mpg) is about 2.74, and the box plots show a clearly higher
center for the Japanese group. The large separation of the centers
relative to the within-group spreads implies a very small p-value, so we
reject \(H_0\). The evidence supports the study claim that, on average,
US cars have lower fuel economy than Japanese cars (on the log scale),
and the corresponding confidence interval for μ1 − μ2 would lie below
zero. In other words, According to the T-test we received p-value =
6.528e-14 and it is significantly less than the level of significance
(0.05). Therefore, we will reject \(H_0\) and conclude that the
alternative hypothesis that the mean mpg of cars manufactured in the US
is less than that of those manufactured in Japan.
df<-read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv")
df$USCars
df$JapaneseCars
qqnorm(df$USCars,main="Normal Q-Q Plot US Cars")
qqnorm(df$JapaneseCars,main="Normal Q-Q Plot Japanese Cars")
boxplot(df$USCars,df$JapaneseCars,names=c("US Cars", "Japanese Cars"),main="Box Plots US vs. Japanese Cars")
logUS<-log(df$USCars)
logJap<-log(df[1:28,2])
boxplot(logUS,logJap,names=c("US Cars", "Japanese Cars"),main="Box Plots Log US vs. Japanese Cars")
mean(logUS)
mean(logJap)
t.test(logUS,logJap,alternative = c("less"),mu = 0, paired = FALSE, var.equal = TRUE,conf.level = 0.95)