knitr::opts_chunk$set(echo = TRUE)
dat<-read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv")
dat1<-dat$ï..USCars
dat2<-dat$JapaneseCars
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.0.5
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
To be considered large enough , sample count has to be more than 40 the count of japanese vars is: there are both lesss than 40 and therefore not large enough to assume the central limit theorem
checking for normal distribution
qqnorm(dat1, main="mpg of uscar")
qqline(dat1)
qqnorm(dat2, main="mpg of japanesecar")
qqline(dat2)
both uscars and japanese cars follows a normal distribrution because all data point falls on the normal regression line ##### Question 3
comparing variance of uscars and japanesecars
boxplot(dat1,dat2, names=c("uscars","japanesecars"),main="BoxPlot of MPg")
> the boxplot suggest too big of difference between the var of us cars and japanese cars
loguscar<-log(dat1)
logjapancar<-log(dat2)
ecking normality of transformed data
qqnorm(loguscar, main="mpg uscar transformed data")
qqline(loguscar)
qqnorm(logjapancar,main="mpg japanesecar transformed data")
qqline(logjapancar)
both plot follow the normal regression line , therefore are normally distributed
comparing variance of transformed data
boxplot(loguscar,logjapancar, names=c("uscars","japanesecars"),main="Boxplot of transformed MPg data")
> Yes the varriance are constant
Setting the hypothesis # Ho: Mean1-Mean2=0 # Ha: Mean1 =/= Mean 2, ,Note" =/= " means not equal" T-test
t.test(loguscar,logjapancar,var.equal = TRUE, alternative = "less")
##
## Two Sample t-test
##
## data: loguscar and logjapancar
## t = -9.4828, df = 61, p-value = 6.528e-14
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -0.4366143
## sample estimates:
## mean of x mean of y
## 2.741001 3.270957
conclusion ### the p-value = 6.528e-14 , very low compared to the significance level ### therefore we fail to reject null hypothesis ``` { r eval=Fals}(dat1) count(dat2)