The airquality data will be used for this Exercise. The information about variables in the dataset can be found in R.
Perform a hypothesis test -whether Wind in July has a different speed (mph) than Wind in August.
# Filter the July & August data
wind.july=airquality %>%
filter(airquality$Month==7)
wind.august=airquality %>%
filter(airquality$Month==8)
# divide 1X2
par(mfrow=c(1,2))
# QQ Plot for wind, July
qqnorm(wind.july$Wind,main = "QQ Plot for Wind - July",ylab="MPH")
qqline(wind.july$Wind,col="red")
# QQ Plot for wind, August
qqnorm(wind.august$Wind,main = "QQ Plot for Wind - August",ylab = "MPH")
qqline(wind.august$Wind,col="red")
# shapiro-Wilk Test
#Print P-Value for July
wind.july$Wind %>%
shapiro.test() %>%
print()
##
## Shapiro-Wilk normality test
##
## data: .
## W = 0.95003, p-value = 0.1564
# Print P-Value for August
wind.august$Wind %>%
shapiro.test() %>%
print()
##
## Shapiro-Wilk normality test
##
## data: .
## W = 0.98533, p-value = 0.937
Both data follow normal distribution, therefore we should perform “T-Test”.
Null Hypothesis (H0):
Mean of wind in July = Mean of wind in August, it means Wind in July has similar speed (mph) as Wind in August.
Alternate Hypothesis (H1):
Mean of wind in July != Mean of wind in August, it means Wind in July has different speed (mph) with Wind in August.
1- We must check the variances for both data is equal or not.
# Make a Union data frame, incluse July and August data
wind.july.august=union(wind.july,wind.august)
# Check the Variances
var.test(wind.july.august$Wind ~ wind.july.august$Month, alternative="two.sided")
##
## F test to compare two variances
##
## data: wind.july.august$Wind by wind.july.august$Month
## F = 0.8857, num df = 30, denom df = 30, p-value = 0.7418
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.4270624 1.8368992
## sample estimates:
## ratio of variances
## 0.8857035
According to the P-Value, because it’s very bigger than significant value, thus we don’t have enough evidence to reject H0
# Run pooled T-Test
t.test(wind.july.august$Wind ~ wind.july.august$Month,alternative="two.sided",var.equal=TRUE)
##
## Two Sample t-test
##
## data: wind.july.august$Wind by wind.july.august$Month
## t = 0.1865, df = 60, p-value = 0.8527
## alternative hypothesis: true difference in means between group 7 and group 8 is not equal to 0
## 95 percent confidence interval:
## -1.443108 1.739883
## sample estimates:
## mean in group 7 mean in group 8
## 8.941935 8.793548
P-Value >> significant value
P-Value is very bigger then significant value,therefor we don’t have enough evidence to reject H0
result: Mean of wind in July = Mean of wind in August, it means Wind in July has similar speed (mph) as Wind in August.
comments:
According to the QQ-Plot, there are a majority points near the red line and regarding to P-Values:
P-Value > Significant Value
therefore, we don’t have enough evidence to reject the “Null Hypothesis”, hence both data follow normal distribution.