Assignment 4

#setup
#install.packages(dplyr)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

mpg <- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv", col.names=c("USCars","JapaneseCars"))
str(mpg)

## 'data.frame':    35 obs. of  2 variables:
##  $ USCars      : int  18 15 18 16 17 15 14 14 14 15 ...
##  $ JapaneseCars: int  24 27 27 25 31 35 24 19 28 23 ...

Question 1:

False, the central limit theorem does not holds for 35 objects (small sample size).

Question 2:

As the normal plot does not represent the straight line for the US cars it is false, on the other side the Japanese cars appears to be normally distributed.

mpgUS <- mpg$USCars
mpgJapanese <- mpg$JapaneseCars
qqnorm(mpgUS, main="US Cars")

#qqline(mpgUS)
qqnorm(mpgJapanese, main="Japanese Cars")

#qqline(mpgJapanese)

Question 3:

The variance is high between both populations as the box plot shows. Thus is not constant.

#box plot with labels
boxplot(mpgUS,mpgJapanese, names = c('US cars' , "Japanese Cars"),ylab = "mpg", main = "US Cars/ Japanese")

Question 4:

After transforming the data using a Log function, the box plot shows that the variance has decreased we can apply the Levene’s test as we have a strong assumption that the variance is equal and by the normal we can assume that the data is normally distributed after the transformation.

#log transformation
logUS <- log(mpgUS)
logJapanese <- log(mpgJapanese)
qqnorm(logUS, main="Log transform - US Cars")
qqline(logUS,)

qqnorm(logJapanese, main="Log transform - Japanese Cars")
qqline(logJapanese)

#box plot with labels
boxplot(logUS,logJapanese, names = c('US cars' , "Japanese Cars"),ylab = "mpg", main = "US Cars/ Japanese - Transformed")

Question 5:

a)

US cars mean = 2.741001 Japanese cars mean = 3.270957 As p-value is less than 0.05 we can reject the non-hypothesis

#levene's
t.test(logUS, logJapanese, variance= TRUE, alternative ="less")

## 
##  Welch Two Sample t-test
## 
## data:  logUS and logJapanese
## t = -9.804, df = 60.651, p-value = 2.008e-14
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##        -Inf -0.4396641
## sample estimates:
## mean of x mean of y 
##  2.741001  3.270957

b) Conclusion:

We are rejection the non-hypothesis based on the p-value, the difference between the means for the two groups is significantly different. Thus we conclude that the alternative hypothesis is correct, which mean that the US mpg mean is smaller than the Japanese mpg mean.