The R datasets library contains data on air quality in New York (airquality). Conduct a hypothesis test to evaluate if ozone levels are a function of month. NOTE: dichotomize month. If that test were significant, what else would be required? Post your hypothesis test and R code with your discussion.
This question confused me a little bit. My gut check says to do an ANOVA on the data (which I do in the R code below). The results from that show a P-value of 0.0776, which is low, but would not be significant with a significance level of 0.05. Generating a barplot of the means shows an upward trend, but a Chi-square test generates a P-value of 0.5142 which is not significant.
# Run an ANOVA on the data set
aq_aov = aov(Ozone~Month, data=airquality)
summary(aq_aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## Month 1 3387 3387 3.171 0.0776 .
## Residuals 114 121756 1068
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 37 observations deleted due to missingness
aq_tab = model.tables(aq_aov, "means")
## Warning in replications(paste("~", xx), data = mf): non-factors ignored: Month
barplot(aq_tab$tables$Month)
chisq.test(aq_tab$tables$Month)
##
## Chi-squared test for given probabilities
##
## data: aq_tab$tables$Month
## X-squared = 3.2669, df = 4, p-value = 0.5142