Assignment 4

An environmental group would like to test the hypothesis that the mean mpg of cars manufactured in the US is less than that of those manufactured in Japan. Towards this end, they sampled n1=35 US and n2=28 Japanese cars, which were tested for mpg fuel efficiency. (As a caveat, assume that this is a random sample from a large population of US and Japanese cars, not a complete census). The data is reported in the following file csv file:

library(ggplot2)
library(summarytools)
library(dplyr)
library(tidyverse)
library(tidyr)
library(plot.matrix)
dat<-read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv")

Question 1

Does the mpg of both US cars and Japanese cars appear to be Normally distributed (use NPPs)?

US Cars MPG

#Normally distributed plot for US Cars mpg with line of best fit
qqnorm(dat$USCars, main="US Cars MPG Normal Probability Plot", xlab="Quantiles", ylab="MPG")
qqline(dat$USCars, col = "red")

Japanese Cars MPG

#Normally distributed plot for Japanese Cars mpg with line of best fit
qqnorm(dat$JapaneseCars, main="Japanese Cars MPG Normal Probability Plot", xlab="Quantiles", ylab="MPG")
qqline(dat$JapaneseCars, col = "blue")

MPG Normal Distribution Analysis

Yes, the US cars and the Japanese cars both trend on the line of best fit; therefore, each set of vehicles can be assumed to be normally distributed. However, the distribution for US Cars appears to be slightly skewed to the left.


Question 2

Does the variance appear to be constant (use side-by-side boxplots)?

var(dat$USCars)
## [1] 16.44034
var(dat$JapaneseCars, na.rm = TRUE)
## [1] 22.12037
label=c("US","Japanese")
boxplot(dat$USCars,dat$JapaneseCars, main="Boxplots of US and Japanese Cars MPG", ylab="MPG",col= rainbow (2), border="black", names=label)

The variance is an order of magnitude differential the plot does not have a constant variance. As the plot states the US Cars MPG has a smaller performance variance in comparison to the Japanese Cars MPG. It can be assumed that Japanese Cars present a more fuel efficient ground vehicle source than the US Cars, since the MPG is higher for the Japanese Cars.


Question 3

Transform the data using a log transform and repeat questions 1 and 2. Comment on the differences between the plots. Use the transformed data for the remaining questions.

dat1 <- log(dat)

US Cars MPG

qqnorm(dat1$USCars, main="US Cars MPG Log-Normal Probability Plot", xlab="Quantiles", ylab="log(MPG)")
qqline(dat1$USCars, col = "green4")

The distribution is now more symmetrical when compared to the non-log transformed data.

Japanese Cars MPG

qqnorm(dat1$JapaneseCars, main="Japanese Cars MPG Log-Normal Probability Plot", xlab="Quantiles", ylab="log(MPG)")
qqline(dat1$JapaneseCars, col = "purple")

The distribution is now slightly right-skewed when compared to the non-log transformed data.

Boxplots of Log Transforms

var(dat1$USCars)
## [1] 0.06085468
var(dat1$JapaneseCars, na.rm = TRUE)
## [1] 0.03313062
boxplot(dat1$USCars,dat1$JapaneseCars, main="Boxplots of Log Transform of US and Japanese Cars MPG", ylab="MPG",col= rainbow (2), border="black", names=label)

The variance is an order of magnitude differential the plot does have a constant variance. As the plot states the US Cars MPG has a similar performance variance in comparison to the Japanese Cars MPG. Therefore, it cannot be assumed that Japanese Cars present a more fuel efficient ground vehicle source than the US Cars, since the variance is similar for the Japanese Cars.


Question 4

State the null and alternative hypothesis and test using a 0.05 level of significance.

H_0: MPG US = MPG Japan
H_a: MPG US < MPG Japan

t.test(dat1$USCars,dat1$JapaneseCars,var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  dat1$USCars and dat1$JapaneseCars
## t = -9.4828, df = 61, p-value = 1.306e-13
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.6417062 -0.4182053
## sample estimates:
## mean of x mean of y 
##  2.741001  3.270957

4a

What are the sample averages for the log of the mpg of US and Japanese cars?

mean(dat1$USCars)
## [1] 2.741001
mean(dat1$JapaneseCars, na.rm=TRUE)
## [1] 3.270957

4b

State your conclusions.

p-value=1.306e-13 is less than alpha=.05, therefore we reject the null hypothesis that US and Japan fuel efficiency is the same.


Complete R Code

library(ggplot2)
library(summarytools)
library(dplyr)
library(tidyverse)
library(tidyr)
library(plot.matrix)
dat<-read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv")

# Question 1
# US Cars MPG
qqnorm(dat$USCars, main="US Cars MPG Normal Probability Plot", xlab="Quantiles", ylab="MPG")
qqline(dat$USCars, col = "red")

# Japanese Cars MPG
qqnorm(dat$JapaneseCars, main="Japanese Cars MPG Normal Probability Plot", xlab="Quantiles", ylab="MPG")
qqline(dat$JapaneseCars, col = "blue")

# Question 2
var(dat$USCars)
var(dat$JapaneseCars, na.rm = TRUE)

label=c("US","Japanese")
boxplot(dat$USCars,dat$JapaneseCars, main="Boxplots of US and Japanese Cars MPG", ylab="MPG",col= rainbow (2), border="black", names=label)

# Question 3
dat1 <- log(dat)

# US Cars MPG
qqnorm(dat1$USCars, main="US Cars MPG Normal Probability Plot", xlab="Quantiles", ylab="log(MPG)")
qqline(dat1$USCars, col = "green4")

# Japanese Cars MPG
qqnorm(dat1$JapaneseCars, main="Japanese Cars MPG Normal Probability Plot", xlab="Quantiles", ylab="log(MPG)")
qqline(dat1$JapaneseCars, col = "purple")

#Boxplots
var(dat1$USCars)
var(dat1$JapaneseCars, na.rm = TRUE)

boxplot(dat1$USCars,dat1$JapaneseCars, main="Boxplots of Log Transform of US and Japanese Cars MPG", ylab="MPG",col= rainbow (2), border="black", names=label)

# Question 4
t.test(dat1$USCars,dat1$JapaneseCars,var.equal = TRUE)

# 4a. 
mean(dat1$USCars)
mean(dat1$JapaneseCars, na.rm=TRUE)