days <- c(91, 105, 106, 108, 88, 91, 58, 82, 81, 65, 61, 48, 61 ,43, 33, 36)
index <- c(16.7, 17.1, 18.2, 18.1, 17.2, 18.2, 16, 17.2, 18, 17.2, 16.9, 17.1, 18.2, 17.3, 17.5, 16.6)

Question 1: Regress Days on Index using simple linear regression, what are the estimates of your fitted regression line?

model <- lm(days~index)
plot(index,days)
abline(model)

coef(model)
## (Intercept)       index 
##  -192.98383    15.29637

The slope and intercept for the regression line are 15.29637 and -192.98383 respectively.

Question 2: What is the value of R^2?

summary(model)$r.square
## [1] 0.1584636

The value of R-square is for this regression is 0.1584636.

Question 3: Test for the signifiance of the regression at a 0.05 level of signficance assuming the response is Normally distributed, what is your conclusion?

\(H_0: \beta_1=0\)

\(H_1: \beta_1\ne0\)

summary(model)
## 
## Call:
## lm(formula = days ~ index)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -41.70 -21.54   2.12  18.56  36.42 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) -192.984    163.503  -1.180    0.258
## index         15.296      9.421   1.624    0.127
## 
## Residual standard error: 23.79 on 14 degrees of freedom
## Multiple R-squared:  0.1585, Adjusted R-squared:  0.09835 
## F-statistic: 2.636 on 1 and 14 DF,  p-value: 0.1267

For the statistical summary of the model we can see that the p-value for the slope (0.127) is larger than 0.05. Therefore, we fail to reject the null hypothesis.

Question 4: Regardless of whether you conclude that the regression is signficant above, make a scatterplot of the data showing the fitted regression line, confidence interval, and prediction interval

newindex <- seq(15,20,0.1)
conf <- predict(model,data.frame(index=newindex),interval = "confidence")
pred <- predict(model,data.frame(index=newindex),interval = "prediction")
plot(index,days)
abline(model)
lines(newindex,conf[,2])
lines(newindex,conf[,3])
lines(newindex,pred[,2])
lines(newindex,pred[,3])

Here, we can see the 95% confidence interval. However, the prediction interval falls outside the graph limits.

Question 4: Calculate a 95% confidence interval on the mean number of days the ozone level exceeds 20ppm when the meterological index is 17.0. Comment on the meaning of this interval?

conf1 <- predict(model,data.frame(index=17),interval = "confidence")
conf1
##        fit      lwr      upr
## 1 67.05437 52.52748 81.58127

The predicted value of days is 67.05437, with a lower bound of 52.52748 and an upper bound of 81.58127. This means that based on these analysis, we are 95% confident that the number of days that number of days the ozone level exceeds 20ppm when the meteorological index is 17.0, is somewhere between 52.52748 and 81.58127 days.

Question 5: Calculate a 95% prediction interval on the mean number of days the ozone level exceeds 20ppm when the meterological index is 17.0. Comment on the meaning of this intervall? Compare the width of the prediction interval to that of the confidence interval and comment.

pred1 <- predict(model,data.frame(index=17),interval = "prediction")
pred1
##        fit      lwr      upr
## 1 67.05437 13.99203 120.1167

Based on these results our model can predict (with 95% confidence) that the number of days the ozone level exceeds 20ppm when the meteorological index is 17.0 is between 13.99203 and 120.1167. As expected the width of our prediction interval is larger than our confidence interval.