Data X as Index Y as Days
y <- c(91, 105, 106, 108, 88, 91, 58, 82, 81, 65, 61, 48, 61, 43, 33, 36)
x <- c(16.7, 17.1, 18.2, 18.1, 17.2, 18.2, 16.0, 17.2, 18.0, 17.2, 16.9, 17.1, 18.2, 17.3, 17.5, 16.6)
cbind(x,y)
## x y
## [1,] 16.7 91
## [2,] 17.1 105
## [3,] 18.2 106
## [4,] 18.1 108
## [5,] 17.2 88
## [6,] 18.2 91
## [7,] 16.0 58
## [8,] 17.2 82
## [9,] 18.0 81
## [10,] 17.2 65
## [11,] 16.9 61
## [12,] 17.1 48
## [13,] 18.2 61
## [14,] 17.3 43
## [15,] 17.5 33
## [16,] 16.6 36
Make a scatterplot of the data, Title the plot and label the axes appropriately
#Answer
Plot Days vs Index
plot(x,y, main= " Days vs Index", xlab="Index", ylab= "Days")
Model_1 object that represents the regression of Days on Index
model_1 <- lm(y~x)
Parameters of simple linear regression
Intercept : -193.0
Slop : 15.3
model_1
##
## Call:
## lm(formula = y ~ x)
##
## Coefficients:
## (Intercept) x
## -193.0 15.3
#plotting the graph
plot(x,y, main="Simple Linear Regression")
abline(model_1)
# min and max of predictor variable Days
min(x)
max(x)
# Create a sequence of values that span min to max Days with spacing of 5.0
new_x <- seq(16, 18.2, 0.02)
# Create the confidence and prediction interval at each of new_x values
conf <- predict(model_1, data.frame(x=new_x), interval=“confidence”)
pred <- predict(model_1, data.frame(x=new_x), interval=“prediction”)
lines(new_x,conf[,2], col=“blue”)
lines(new_x,conf[,3], col=“blue”)
lines(new_x,pred[,2], col=“orange”)
lines(new_x,pred[,3], col=“orange”)
legend(“topleft”, inset=c(0, 0), legend=c(“Linear regression line”,“Confidence interval”, “Prediction interval”), lty=1, col=c(“black”,“blue”,“orange”), cex=.8
What is the value of R2?
#Answer
summary(model_1)$r.square
## [1] 0.1584636
summary(model_1)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -192.98383 163.503283 -1.180306 0.2575450
## x 15.29637 9.420975 1.623650 0.1267446
#Answer
Looking at the summary table above, we can notice that for this linear regression model, t0=1.62365 and for t with 95% confidence and degree of freedom equal to 14,we have that (0.025,14)=2.145.
Since t0 value is between -t(0.025,14) and t(0.025,14), therefore we fail to reject H0, which means that our variable “Days” does not have linear relationship with “Index”.
Regardless of whether you conclude that the regression is signficant above, make a scatterplot of the data showing the fitted regression line, confidence interval, and prediction interval
#Answer
Calculate a 95% confidence interval on the mean number of days the ozone level exceeds 20ppm when the meteorological index is 17.0. Comment on the meaning of this interval?
ss_res <- sum((fitted(model_1)-y)^2) #(sum of sequare Error)
msr <- summary(model_1)$sigma^2
sxx <- sum(x^2)- sum(x)^2/length(x) #sxx
t_0.025 <- 2.145 # t-test value from tables
y_17 <- model_1$coefficients[1] + model_1$coefficients[2] *(17) # E(Y|x=17)
# simplifying the equation (the parts under the square root)
s <- (17-mean(x))^2
v <- s/sxx
# get the confidince interval
lower_cI <- y_17 - t_0.025 * sqrt(msr * ((1/length(x))+ v))
upper_cI <- y_17 + t_0.025 * sqrt(msr * ((1/length(x))+ v))
#Answer
lower_cI
## (Intercept)
## 52.52604
upper_cI
## (Intercept)
## 81.58271Calculate a 95% prediction interval on the mean number of days the ozone level exceeds 20ppm when the meterological index is 17.0. Comment on the meaning of this interval? Compare the width of the prediction interval to that of the confidence interval and comment
lower_pI <- y_17 - t_0.025 * sqrt(msr * (1+(1/length(x))+ v))
upper_pI <- y_17 + t_0.025 * sqrt(msr * (1+(1/length(x))+ v))
#Answer
lower_pI
## (Intercept)
## 13.98675
upper_pI
## (Intercept)
## 120.122