A: Plot your data first (as always). Be sure to include an informative figure caption.

boxplot(airquality$Ozone~airquality$Temp)
Figure 1: Data plot showing ozone and temperature in New York

Figure 1: Data plot showing ozone and temperature in New York

hist(airquality$Ozone)
Figure 2: Histogram for ozone in New York

Figure 2: Histogram for ozone in New York

hist(airquality$Temp)
Figure 3: Histogram for temperature in New York

Figure 3: Histogram for temperature in New York

B: Are both of your continuous variables displaying normal distribution? Provide two pieces of evidence that display your normality assessment for each variable.

qqnorm(airquality$Ozone) #This needs to be transformed.
qqline(airquality$Ozone)
Figure 4: QQ plot showing normality for the ozone levels in New York

Figure 4: QQ plot showing normality for the ozone levels in New York

qqnorm(airquality$Temp) #This data is kind of normal for Temperature.
qqline(airquality$Temp)
Figure 5: Q-Q plot showing normality for the temperature in New York

Figure 5: Q-Q plot showing normality for the temperature in New York

hist(airquality$Temp) #This shows the data is normal for temperature
Figure 6: Histogram showing normality for the temperature in New York

Figure 6: Histogram showing normality for the temperature in New York

C: Did you transform one or more of your variables? If so, state which transformation you used. Provide two pieces of evidence that your data more closely approximates a normal distribution. If not, state why you did not transform the data.

I transformed the ozone data as it was not normal in its raw form. For the ozone measurement I used the log function because when looking at the histogram it made a more normal distribution. For the temperature I looked at the histogram and it was almost perfectly normal.

airquality$OzoneLog<-log10(airquality$Ozone+0.0001)
hist(airquality$OzoneLog)
Figure 7: Log transformation of the data on a histogram.

Figure 7: Log transformation of the data on a histogram.

airquality$OzoneLog<-log10(airquality$Ozone+0.0001)
qqnorm(airquality$OzoneLog) #Doesn't help with normality
qqline(airquality$OzoneLog)
Figure 8: Log transformation of the ozone data on a q-q plot.

Figure 8: Log transformation of the ozone data on a q-q plot.

hist(airquality$Temp) #This histogram is pretty normal.
Figure 9: Histogram of temperature.

Figure 9: Histogram of temperature.

qqnorm(airquality$Temp) #This also shows normality.
qqline(airquality$Temp)
Figure 10: Q-Q plot of Temperature.

Figure 10: Q-Q plot of Temperature.

D: Create a linear regression object with the appropriate data (raw or transformed). Be sure to place the variables on the correct axis. Present your code and output into R Markdown file.

airquality.LM<-lm(airquality$Ozone~airquality$Temp)
airqualityTrans.LM <-lm(airquality$OzoneLog~airquality$Temp)
plot(airquality.LM) #It approximates homoscedasticity. 

plot(airqualityTrans.LM)

E: Did you accept or reject the null hypothesis? Are the results statistically significant? Provide and interpret two evidence graphs that the residuals meet the assumptions of the linear model.

I rejected the null hypothesis and the results are statistically significant.

summary(airqualityTrans.LM) 
## 
## Call:
## lm(formula = airquality$OzoneLog ~ airquality$Temp)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.93139 -0.14373  0.01286  0.15855  0.64893 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -0.798204   0.195865  -4.075 8.53e-05 ***
## airquality$Temp  0.029316   0.002497  11.741  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.254 on 114 degrees of freedom
##   (37 observations deleted due to missingness)
## Multiple R-squared:  0.5473, Adjusted R-squared:  0.5434 
## F-statistic: 137.8 on 1 and 114 DF,  p-value: < 2.2e-16
plot(airqualityTrans.LM)

F: Summarize your results in a paragraph similar to the example in the ``Reporting Your Results’’ section. Be sure to also provide a final graph of your data including a best fit line.

There is a significant positive relationship between the amount of ozone and the temperature of airquality in New York (Linear Model p-value <0.001, Multiple R-Squared= 0.5473). The ozone measurement was transformed to approximate normality of the residuals using the Log functions. The temperature measurement was not transformed.

plot(airquality$OzoneLog~airquality$Temp)
abline(airqualityTrans.LM)

Please turn–in your homework via Sakai by saving and submitting an R Markdown PDF or HTML file from R Pubs!