Create a combined mpg variable called MPG_Combo which combines 55% of the MPG_City and 45% of the MPG_Highway. Obtain a box plot for MPG_Combo and comment on what the plot tells us about fuel efficiencies.
MPG_Combo <- 0.55*cars$MPG_City+0.45*cars$MPG_Highway
cars=data.frame(cars, MPG_Combo)
boxplot(MPG_Combo,
main = "Distro of Fuel Efficency",
ylab = "MPG_Combo",
col = "Red",
border = "Blue",
horizontal = FALSE
)
points(mean(MPG_Combo, na.rm=TRUE), col = "White")
Obtain box plots for MPG_Combo by Type and comment on any differences you notice between the different vehicle types combined fuel efficiency.
boxplot(MPG_Combo ~ Type, data=cars,
main = "Distro of Fuel Efficency by Type of Vehicle",
ylab = "MPG_Combo",
xlab = "Type",
col = "red",
border = "blue",
horizontal = FALSE
)
Obtain basic descriptive statistics for Invoice for all vehicles. Comment on any general features and statistics of the data. Use visual and quantitative methods to comment on whether an assumption of Normality would be reasonable for Invoice variable.
summary(cars$Invoice)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9875 18973 25672 30096 35777 173560
qqnorm(cars$Invoice)
qqline(cars$Invoice , col = "red")
shapiro.test(cars$Invoice)
##
## Shapiro-Wilk normality test
##
## data: cars$Invoice
## W = 0.77353, p-value < 2.2e-16
Use visual and quantitative methods to comment on whether an assumption of normality would be reasonable for Invoice variable by Origin. (i.e., check normality of Invoice from i) Europe, ii) Asian, and iii) USA cars.
boxplot(Invoice ~ Origin, data=cars,
main = "Invoice Vs. Origin",
ylab = "Origin",
xlab = "Invoice",
col = "red",
border = "blue",
horizontal = FALSE
)
histogram_plot <- ggplot(data=cars, mapping=aes(x=Invoice))+geom_histogram(aes(fill=Origin, color=Origin), alpha = 0.25, bins=40) + facet_wrap(Origin~.)
histogram_plot
shapiro.test(cars[cars$Origin=='Europe', "Invoice"])
##
## Shapiro-Wilk normality test
##
## data: cars[cars$Origin == "Europe", "Invoice"]
## W = 0.79809, p-value = 1.024e-11
shapiro.test(cars[cars$Origin=='Asia', "Invoice"])
##
## Shapiro-Wilk normality test
##
## data: cars[cars$Origin == "Asia", "Invoice"]
## W = 0.84696, p-value = 2.012e-11
shapiro.test(cars[cars$Origin=='USA', "Invoice"])
##
## Shapiro-Wilk normality test
##
## data: cars[cars$Origin == "USA", "Invoice"]
## W = 0.89222, p-value = 6.42e-09
Perform a hypothesis test of whether cars originated in Europe have different invoice price than Asian cars, and state your conclusions.
Which test should we perform, and why? Justify your answer based on findings on Exercise 1 (d).
Specify null and alternative hypotheses.
State the conclusion based on the test result.
asia_europe = filter(cars, Origin == 'Asia' | Origin == 'Europe')
wilcox.test(Invoice ~Origin, data=asia_europe, exact=FALSE)
##
## Wilcoxon rank sum test with continuity correction
##
## data: Invoice by Origin
## W = 2344, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
Which test should we perform, and why? See QQ-plot and perform Shapiro-Wilk test for normality check.
*First check for Normality
View(airquality)
summary(airquality)
## Ozone Solar.R Wind Temp
## Min. : 1.00 Min. : 7.0 Min. : 1.700 Min. :56.00
## 1st Qu.: 18.00 1st Qu.:115.8 1st Qu.: 7.400 1st Qu.:72.00
## Median : 31.50 Median :205.0 Median : 9.700 Median :79.00
## Mean : 42.13 Mean :185.9 Mean : 9.958 Mean :77.88
## 3rd Qu.: 63.25 3rd Qu.:258.8 3rd Qu.:11.500 3rd Qu.:85.00
## Max. :168.00 Max. :334.0 Max. :20.700 Max. :97.00
## NA's :37 NA's :7
## Month Day
## Min. :5.000 Min. : 1.0
## 1st Qu.:6.000 1st Qu.: 8.0
## Median :7.000 Median :16.0
## Mean :6.993 Mean :15.8
## 3rd Qu.:8.000 3rd Qu.:23.0
## Max. :9.000 Max. :31.0
##
july_august = filter(airquality,
Month ==7 | Month==8)
qqnorm(airquality$Wind)
qqline(airquality$Wind , col = "red")
shapiro.test(airquality[airquality$Month==7, "Wind"])
##
## Shapiro-Wilk normality test
##
## data: airquality[airquality$Month == 7, "Wind"]
## W = 0.95003, p-value = 0.1564
shapiro.test(airquality[airquality$Month==8, "Wind"])
##
## Shapiro-Wilk normality test
##
## data: airquality[airquality$Month == 8, "Wind"]
## W = 0.98533, p-value = 0.937
var.test(Wind ~Month, july_august, alternative ="two.sided")
##
## F test to compare two variances
##
## data: Wind by Month
## F = 0.8857, num df = 30, denom df = 30, p-value = 0.7418
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.4270624 1.8368992
## sample estimates:
## ratio of variances
## 0.8857035
t.test(Wind ~Month, july_august,alternative ="two.sided", var.equal=TRUE)
##
## Two Sample t-test
##
## data: Wind by Month
## t = 0.1865, df = 60, p-value = 0.8527
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.443108 1.739883
## sample estimates:
## mean in group 7 mean in group 8
## 8.941935 8.793548
Specify null and alternative hypotheses.
State the conclusion based on the test result.