Getting Data
data <- read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/US_Japanese_Cars.csv")
Question 1)
data
## ï..USCars JapaneseCars
## 1 18 24
## 2 15 27
## 3 18 27
## 4 16 25
## 5 17 31
## 6 15 35
## 7 14 24
## 8 14 19
## 9 14 28
## 10 15 23
## 11 15 27
## 12 14 20
## 13 15 22
## 14 14 18
## 15 22 20
## 16 18 31
## 17 21 32
## 18 21 31
## 19 10 32
## 20 10 24
## 21 11 26
## 22 9 29
## 23 28 24
## 24 25 24
## 25 19 33
## 26 16 33
## 27 17 32
## 28 19 28
## 29 18 NA
## 30 14 NA
## 31 14 NA
## 32 14 NA
## 33 14 NA
## 34 12 NA
## 35 13 NA
Answer : As the size of US cars is 35, given in question and Japanese cars is 28 and as both the sample size are not greater than 40, Hence Central limit theorem cannot help us here
Question 2)Checking for normal probibility plot
qqnorm(data$JapaneseCars, col="Red", main="Normal Probability plot for mpg of Japanese cars ")
qqline(data$JapaneseCars)

qqnorm(data$ï..USCars, col="blue",main="Normal Probability plot for mpg of US cars ")
qqline(data$ï..USCars)

Answer - As data points on both the plots are almost falling on a straight line ,we can say that both the distributions appear to be normally distributred
Question 3
boxplot(data$JapaneseCars,data$ï..USCars, names=c("Japanese Cars","US Cars"), main="mpg comparision of Japanese and US Cars")

Answer : No , Variance of both the sample does not look constant.
Question 4
data2 <- log(data)
#Now cheching normal probability distribution
qqnorm(data2$JapaneseCars,col="Red", main="Normal Probability plot for mpg of log transformed data of Japanese cars ")
qqline(data2$JapaneseCars)

qqnorm(data2$ï..USCars,col="Red", main="Normal Probability plot for mpg of log transformed data of US cars ")
qqline(data2$ï..USCars)

#Now using side by side box plot
boxplot(data2$JapaneseCars,data2$ï..USCars, names=c("Japanese Cars","US Cars"), main="mpg comparision of Japanese and US Cars")

Answer : After performing log transformation of both sample we can see that the variance of both the samples are now reasonably close. And the probability plot seems fairly normally distributed for both the samples, as the data points on both the plots are almost falling in a straight line. We also saw after log transformation the data points in both NPP came more closer to the straight line
Question 5
Including library
library(dplyr)