knitr::opts_chunk$set(echo = TRUE)
dat<-read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv")
dat1<-dat$ï..USCars
dat2<-dat$JapaneseCars
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.0.5
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
Answer to Question 1

To be considered large enough , sample count has to be more than 40 the count of japanese vars is: there are both lesss than 40 and therefore not large enough to assume the central limit theorem

Question 2

checking for normal distribution

qqnorm(dat1, main="mpg of uscar")
qqline(dat1)

qqnorm(dat2, main="mpg of japanesecar")
qqline(dat2)

both uscars and japanese cars follows a normal distribrution because all data point falls on the normal regression line ##### Question 3

comparing variance of uscars and japanesecars

boxplot(dat1,dat2, names=c("uscars","japanesecars"),main="BoxPlot of MPg")

> the boxplot suggest too big of difference between the var of us cars and japanese cars

Question 4
loguscar<-log(dat1)
logjapancar<-log(dat2)

ecking normality of transformed data

qqnorm(loguscar, main="mpg uscar transformed data")
qqline(loguscar)

qqnorm(logjapancar,main="mpg japanesecar transformed data")
qqline(logjapancar)

both plot follow the normal regression line , therefore are normally distributed

comparing variance of transformed data

boxplot(loguscar,logjapancar, names=c("uscars","japanesecars"),main="Boxplot of transformed MPg data")

> Yes the varriance are constant

Setting the hypothesis # Ho: Mean1-Mean2=0 # Ha: Mean1 =/= Mean 2, ,Note" =/= " means not equal" T-test

t.test(loguscar,logjapancar,var.equal = TRUE, alternative = "less")
## 
##  Two Sample t-test
## 
## data:  loguscar and logjapancar
## t = -9.4828, df = 61, p-value = 6.528e-14
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##        -Inf -0.4366143
## sample estimates:
## mean of x mean of y 
##  2.741001  3.270957

conclusion ### the p-value = 6.528e-14 , very low compared to the significance level ### therefore we fail to reject null hypothesis ``` { r eval=Fals}(dat1) count(dat2)