Summary of Data

We are going to first read in the data from a github site and display the data.

dat<-read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv")
knitr::kable(dat)

USCars	JapaneseCars
18	24
15	27
18	27
16	25
17	31
15	35
14	24
14	19
14	28
15	23
15	27
14	20
15	22
14	18
22	20
18	31
21	32
21	31
10	32
10	24
11	26
9	29
28	24
25	24
19	33
16	33
17	32
19	28
18	NA
14	NA
14	NA
14	NA
14	NA
12	NA
13	NA

We would like to test the hypothesis that \(\mu_1=\mu_2\) against the alternative that \(\mu_1\neq\mu_2\) at and \(\alpha=0.05\) level of significance. This is very important as

Fuel efficiency affects carbon emissions
Fuel prices for society

Further, we are using this analysis to determine whether the EPA should impose penalties on carmakers.

Boxplot

A boxplot of the data is as follows

boxplot(dat$USCars,dat$JapaneseCars,
        main="Boxplot of MPG for US and Japanese Cars",
        col=c("red","green"),
        names=c("USCars","JapaneseCars"))

Transformation

It appears that the means do differ, yet there may be a problem with constant variance when doing a two-sample t-test. Lets transform the data by taking the natural log

dat$USCars<-log(dat$USCars)
dat$JapaneseCars<-log(dat$JapaneseCars)

The sample standard deviation,\(s^2\), of US cars after the transformation is 0.2466874, and for Japanese cars is 0.1820182 (note: use inline R code to calcuate the standard deviations, don’t just type in), for which it deemed that \(\sigma_1\approx\sigma_2\), and hence a two-sample t-test is performed.

Two-Sample t-test

The t-statistic for this test may be computed as

\[ t=\frac{\bar{x}_1-\bar{x_2}}{\sqrt{(s_1^2/n_1)+(s_2^2/n_2)}} \]

The test performed in R is as follows

t.test(dat$USCars, dat$JapaneseCars,
       alternative = "two.sided")

## 
##  Welch Two Sample t-test
## 
## data:  dat$USCars and dat$JapaneseCars
## t = -9.804, df = 60.651, p-value = 4.015e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.6380580 -0.4218536
## sample estimates:
## mean of x mean of y 
##  2.741001  3.270957

Note that the p-value is very small, and hence the equality of the means is rejected, and hence there is statistical evidence that the mean mpg of Japanese cars is less than US Cars.

MPG of US and Japanese Cars

Insert Your Name

Insert the Date

Summary of Data

Boxplot

Transformation

Two-Sample t-test

Complete Code

USCars	JapaneseCars
18	24
15	27
18	27
16	25
17	31
15	35
14	24
14	19
14	28
15	23
15	27
14	20
15	22
14	18
22	20
18	31
21	32
21	31
10	32
10	24
11	26
9	29
28	24
25	24
19	33
16	33
17	32
19	28
18	NA
14	NA
14	NA
14	NA
14	NA
12	NA
13	NA

USCars	JapaneseCars
18	24
15	27
18	27
16	25
17	31
15	35
14	24
14	19
14	28
15	23
15	27
14	20
15	22
14	18
22	20
18	31
21	32
21	31
10	32
10	24
11	26
9	29
28	24
25	24
19	33
16	33
17	32
19	28
18	NA
14	NA
14	NA
14	NA
14	NA
12	NA
13	NA

USCars	JapaneseCars
18	24
15	27
18	27
16	25
17	31
15	35
14	24
14	19
14	28
15	23
15	27
14	20
15	22
14	18
22	20
18	31
21	32
21	31
10	32
10	24
11	26
9	29
28	24
25	24
19	33
16	33
17	32
19	28
18	NA
14	NA
14	NA
14	NA
14	NA
12	NA
13	NA