df<- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv")

Question 1

Generally, for the central limit theorem to be applied, there should be between 30-40 samples of data. The first population has 35 samples, so the central limit theorem holds. However, the second population has 28 samples, which is slightly less than the recommended number of samples. But, this is not a significant deviation, so the central limit theorem should still hold in this second case

Question 2

USCars<- df[,1]
JapaneseCars<- df[1:28,2]

qqnorm(USCars, main = "Normal Q-Q Plot of US Cars")

qqnorm(JapaneseCars,main = "Normal Q-Q Plot of Japanese Cars")

The US cars data has some deviation from normality, with a slight left skew, compared to the Japanese cars data, which has significantly less deviation from normality

Question 3

boxplot(USCars, JapaneseCars, main ="US Cars vs. Japanese Cars Box Plots" ,names=c("US Cars","Japanese Cars"))

The variance doesn’t appear to be constant between the two populations, with the variance of the US cars being greater than the Japanese cars

Question 4

USCars <- log(USCars)
JapaneseCars <- log(JapaneseCars)

qqnorm(USCars, main = "Normal Q-Q Plot of US Cars")

qqnorm(JapaneseCars,main = "Normal Q-Q Plot of Japanese Cars")

boxplot(USCars, JapaneseCars, main ="US Cars vs. Japanese Cars Box Plots" ,names=c("US Cars","Japanese Cars"))

The normal probability plots of the log transformed data are very similar to the untransformed probability plots, with the general trend being the same, the only difference being the magnitude in the sample quantiles. However, the transformed boxplots vary significantly from the untransformed boxplots. The US cars boxplot has a much greater variance than before, and also has greater variance than the Japanese cars boxplot.

Question 5

t.test(USCars,JapaneseCars,var.equal=TRUE,alternative="less")
## 
##  Two Sample t-test
## 
## data:  USCars and JapaneseCars
## t = -9.4828, df = 61, p-value = 6.528e-14
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##        -Inf -0.4366143
## sample estimates:
## mean of x mean of y 
##  2.741001  3.270957
The null hypothesis is that the mean mpg of US cars is less than the mean mpg of Japanese cars. The alternative hypothesis is that the mean mpg of US cars is the same as Japanese cars. The samples averages of the log of the mpg for US and Japanese cars are 2.741001 and 3.270957, respectively. The p-value is very small, virtually zero, which implies that the null hypothesis cannot be rejected, so the mean mpg of US cars is indeed less than the mean mpg of Japanese cars.
# All R code used in document
df<- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv")

USCars<- df[,1]
JapaneseCars<- df[1:28,2]

qqnorm(USCars, main = "Normal Q-Q Plot of US Cars")
qqnorm(JapaneseCars,main = "Normal Q-Q Plot of Japanese Cars")

boxplot(USCars, JapaneseCars, main ="US Cars vs. Japanese Cars Box Plots" ,names=c("US Cars","Japanese Cars"))

USCars <- log(USCars)
JapaneseCars <- log(JapaneseCars)

qqnorm(USCars, main = "Normal Q-Q Plot of US Cars")
qqnorm(JapaneseCars,main = "Normal Q-Q Plot of Japanese Cars")

boxplot(USCars, JapaneseCars, main ="US Cars vs. Japanese Cars Box Plots" ,names=c("US Cars","Japanese Cars"))

t.test(USCars,JapaneseCars,var.equal=TRUE,alternative="less")
head(df)