#James Lunga Discussion 5

The R datasets library contains data on air quality in New York (airquality). Conduct a hypothesis test to evaluate if ozone levels are a function of month. NOTE: dichotomize month. If that test were significant, what else would be required? Post your hypothesis test and R code with your discussion.

First I will take out the NA values.

airdata <- na.omit(airquality)

Next I will dichotomize the data to test Ozona against month.

month5<-subset(airdata,airdata$Month=="5")
month6<-subset(airdata,airdata$Month=="6")
month7<-subset(airdata,airdata$Month=="7")
month8<-subset(airdata,airdata$Month=="8")
month9<-subset(airdata,airdata$Month=="9")

Then I will run a t test to see if the variation in values is significant.

t.test(airdata$Ozone[1:153],airdata$Month[1:153])
## 
##  Welch Two Sample t-test
## 
## data:  airdata$Ozone[1:153] and airdata$Month[1:153]
## t = 11.034, df = 110.43, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  28.61778 41.14798
## sample estimates:
## mean of x mean of y 
## 42.099099  7.216216

The T test shows there is significant vairance in Ozone levels based on month and the small p value suggests it is not a result of randomness. Below is a notched box chart where we can see this visually.

boxplot(airdata$Ozone~airdata$Month,
        xlab = "Month",
        ylab = "Ozone",
        col = "purple",
        border = "red",
        notch = TRUE
)
## Warning in bxp(list(stats = structure(c(1, 11, 18, 33, 45, 12, 20, 23, 37, :
## some notches went outside hinges ('box'): maybe set notch=FALSE