#James Lunga Discussion 5
The R datasets library contains data on air quality in New York (airquality). Conduct a hypothesis test to evaluate if ozone levels are a function of month. NOTE: dichotomize month. If that test were significant, what else would be required? Post your hypothesis test and R code with your discussion.
First I will take out the NA values.
airdata <- na.omit(airquality)
Next I will dichotomize the data to test Ozona against month.
month5<-subset(airdata,airdata$Month=="5")
month6<-subset(airdata,airdata$Month=="6")
month7<-subset(airdata,airdata$Month=="7")
month8<-subset(airdata,airdata$Month=="8")
month9<-subset(airdata,airdata$Month=="9")
Then I will run a t test to see if the variation in values is significant.
t.test(airdata$Ozone[1:153],airdata$Month[1:153])
##
## Welch Two Sample t-test
##
## data: airdata$Ozone[1:153] and airdata$Month[1:153]
## t = 11.034, df = 110.43, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 28.61778 41.14798
## sample estimates:
## mean of x mean of y
## 42.099099 7.216216
The T test shows there is significant vairance in Ozone levels based on month and the small p value suggests it is not a result of randomness. Below is a notched box chart where we can see this visually.
boxplot(airdata$Ozone~airdata$Month,
xlab = "Month",
ylab = "Ozone",
col = "purple",
border = "red",
notch = TRUE
)
## Warning in bxp(list(stats = structure(c(1, 11, 18, 33, 45, 12, 20, 23, 37, :
## some notches went outside hinges ('box'): maybe set notch=FALSE