A: Plot your data first (as always). Be sure to include an informative figure caption.

plot(Ozone ~ Temp, data=airquality) #plot of ozone and temp

hist(airquality$Ozone) #histogram of ozone

hist(airquality$Temp)  #histogram of temp

qqnorm(airquality$Ozone)  #qq plots of Ozone
qqline(airquality$Ozone)

qqnorm(airquality$Temp)   #qq plots of Temp
qqline(airquality$Temp)

B: Are both of your continuous variables displaying normal distribution? Provide two pieces of evidence that display your normality assessment for each variable.

No, my data for Ozone does not appear to be normal, but the data for temperature is normal.

qqnorm(airquality$Ozone)  #qq plot showing how the data is not normal for ozone
qqline(airquality$Ozone)

qqnorm(airquality$Temp)   #qq plot showing how the data is normal for temp
qqline(airquality$Temp)

C: Did you transform one or more of your variables? If so, state which transformation you used. Provide two pieces of evidence that your data more closely approximates a normal distribution. If not, state why you did not transform the data.

Yes I had to transform my data for Ozone. I used a log transformation and added 0.001.

airquality$OzoneLog <- log10(airquality$Ozone + 0.0001)
hist(airquality$OzoneLog)

qqnorm(airquality$OzoneLog)  #qq plots and histogram showing that after the data transformation my data is more normal. 
qqline(airquality$OzoneLog)

D: Create a linear regression object with the appropriate data (raw or transformed). Be sure to place the variables on the correct axis. Present your code and output into R Markdown file.

airquality.LM <- lm(OzoneLog ~ Temp, data= airquality)
plot(airquality.LM)

E: Did you accept or reject the null hypothesis? Are the results statistically significant? Provide and interpret two evidence graphs that the residuals meet the assumptions of the linear model.

I rejected my null hypothesis because my p-value was less than 0.05. The results are statistically significant.

plot(airquality.LM)

summary(airquality.LM)
## 
## Call:
## lm(formula = OzoneLog ~ Temp, data = airquality)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.93139 -0.14373  0.01286  0.15855  0.64893 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.798204   0.195865  -4.075 8.53e-05 ***
## Temp         0.029316   0.002497  11.741  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.254 on 114 degrees of freedom
##   (37 observations deleted due to missingness)
## Multiple R-squared:  0.5473, Adjusted R-squared:  0.5434 
## F-statistic: 137.8 on 1 and 114 DF,  p-value: < 2.2e-16

F: Summarize your results in a paragraph similar to the example in the ``Reporting Your Results’’ section. Be sure to also provide a final graph of your data including a best fit line.

A statistically significant relationship exists between ozone and temperature. The linear model p-value was 8.53e which is less than 0.05, so I rejected the null hypothesis. Data transformations were needed for the Ozone data because it was not normal. Data transformations were not needed for Temperature because that data was normal.

Please turn–in your homework via Sakai by saving and submitting an R Markdown PDF or HTML file from R Pubs!