An environmental group would like to test the hypothesis that the mean mpg of cars manufactured in the US is less than that of those manufactured in Japan. Towards this end, they sampled n1=35 US and n2=28 Japanese cars, which were tested for mpg fuel efficiency. (As a caveat, assume that this is a random sample from a large population of US and Japanese cars, not a complete census). The data is reported in the following file csv file:
library(ggplot2)
library(summarytools)
library(dplyr)
library(tidyverse)
library(tidyr)
library(plot.matrix)
dat<-read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv")
Does the mpg of both US cars and Japanese cars appear to be Normally distributed (use NPPs)?
#Normally distributed plot for US Cars mpg with line of best fit
qqnorm(dat$USCars, main="US Cars MPG Normal Probability Plot", xlab="Quantiles", ylab="MPG")
qqline(dat$USCars, col = "red")
#Normally distributed plot for Japanese Cars mpg with line of best fit
qqnorm(dat$JapaneseCars, main="Japanese Cars MPG Normal Probability Plot", xlab="Quantiles", ylab="MPG")
qqline(dat$JapaneseCars, col = "blue")
Yes, the US cars and the Japanese cars both trend on the line of best fit; therefore, each set of vehicles can be assumed to be normally distributed. However, the distribution for US Cars appears to be slightly skewed to the left.
Does the variance appear to be constant (use side-by-side boxplots)?
var(dat$USCars)
## [1] 16.44034
var(dat$JapaneseCars, na.rm = TRUE)
## [1] 22.12037
label=c("US","Japanese")
boxplot(dat$USCars,dat$JapaneseCars, main="Boxplots of US and Japanese Cars MPG", ylab="MPG",col= rainbow (2), border="black", names=label)
The variance is an order of magnitude differential the plot does not have a constant variance. As the plot states the US Cars MPG has a smaller performance variance in comparison to the Japanese Cars MPG. It can be assumed that Japanese Cars present a more fuel efficient ground vehicle source than the US Cars, since the MPG is higher for the Japanese Cars.
Transform the data using a log transform and repeat questions 1 and 2. Comment on the differences between the plots. Use the transformed data for the remaining questions.
dat1 <- log(dat)
qqnorm(dat1$USCars, main="US Cars MPG Log-Normal Probability Plot", xlab="Quantiles", ylab="log(MPG)")
qqline(dat1$USCars, col = "green4")
The distribution is now more symmetrical when compared to the non-log transformed data.
qqnorm(dat1$JapaneseCars, main="Japanese Cars MPG Log-Normal Probability Plot", xlab="Quantiles", ylab="log(MPG)")
qqline(dat1$JapaneseCars, col = "purple")
The distribution is now slightly right-skewed when compared to the non-log transformed data.
var(dat1$USCars)
## [1] 0.06085468
var(dat1$JapaneseCars, na.rm = TRUE)
## [1] 0.03313062
boxplot(dat1$USCars,dat1$JapaneseCars, main="Boxplots of Log Transform of US and Japanese Cars MPG", ylab="MPG",col= rainbow (2), border="black", names=label)
The variance is an order of magnitude differential the plot does have a constant variance. As the plot states the US Cars MPG has a similar performance variance in comparison to the Japanese Cars MPG. Therefore, it cannot be assumed that Japanese Cars present a more fuel efficient ground vehicle source than the US Cars, since the variance is similar for the Japanese Cars.
State the null and alternative hypothesis and test using a 0.05 level of significance.
H_0: MPG US = MPG Japan
H_a: MPG US < MPG Japan
t.test(dat1$USCars,dat1$JapaneseCars,var.equal = TRUE)
##
## Two Sample t-test
##
## data: dat1$USCars and dat1$JapaneseCars
## t = -9.4828, df = 61, p-value = 1.306e-13
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.6417062 -0.4182053
## sample estimates:
## mean of x mean of y
## 2.741001 3.270957
What are the sample averages for the log of the mpg of US and Japanese cars?
mean(dat1$USCars)
## [1] 2.741001
mean(dat1$JapaneseCars, na.rm=TRUE)
## [1] 3.270957
State your conclusions.
p-value=1.306e-13 is less than alpha=.05, therefore we reject the null hypothesis that US and Japan fuel efficiency is the same.
library(ggplot2)
library(summarytools)
library(dplyr)
library(tidyverse)
library(tidyr)
library(plot.matrix)
dat<-read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv")
# Question 1
# US Cars MPG
qqnorm(dat$USCars, main="US Cars MPG Normal Probability Plot", xlab="Quantiles", ylab="MPG")
qqline(dat$USCars, col = "red")
# Japanese Cars MPG
qqnorm(dat$JapaneseCars, main="Japanese Cars MPG Normal Probability Plot", xlab="Quantiles", ylab="MPG")
qqline(dat$JapaneseCars, col = "blue")
# Question 2
var(dat$USCars)
var(dat$JapaneseCars, na.rm = TRUE)
label=c("US","Japanese")
boxplot(dat$USCars,dat$JapaneseCars, main="Boxplots of US and Japanese Cars MPG", ylab="MPG",col= rainbow (2), border="black", names=label)
# Question 3
dat1 <- log(dat)
# US Cars MPG
qqnorm(dat1$USCars, main="US Cars MPG Normal Probability Plot", xlab="Quantiles", ylab="log(MPG)")
qqline(dat1$USCars, col = "green4")
# Japanese Cars MPG
qqnorm(dat1$JapaneseCars, main="Japanese Cars MPG Normal Probability Plot", xlab="Quantiles", ylab="log(MPG)")
qqline(dat1$JapaneseCars, col = "purple")
#Boxplots
var(dat1$USCars)
var(dat1$JapaneseCars, na.rm = TRUE)
boxplot(dat1$USCars,dat1$JapaneseCars, main="Boxplots of Log Transform of US and Japanese Cars MPG", ylab="MPG",col= rainbow (2), border="black", names=label)
# Question 4
t.test(dat1$USCars,dat1$JapaneseCars,var.equal = TRUE)
# 4a.
mean(dat1$USCars)
mean(dat1$JapaneseCars, na.rm=TRUE)