Hossein Bahreinizad
Abdallah Alrawashdeh
Ahsan Masud Bappi


date: ‘2023-01-31’ output: html_document —

days <- c(91, 105, 106, 108, 88, 91, 58, 82, 81, 65, 61, 48, 61 ,43, 33, 36)
index <- c(16.7, 17.1, 18.2, 18.1, 17.2, 18.2, 16, 17.2, 18, 17.2, 16.9, 17.1, 18.2, 17.3, 17.5, 16.6)

Question 1: Regress Days on Index using simple linear regression, what are the estimates of your fitted regression line?

model <- lm(days~index) #getting simple linear regression model
plot(index,days) #plotting the data provided
abline(model) #show the fitted line on the plot

coef(model) #returning the linear combinations of Yi which are the unbiased estimator of the true B1 and B0,The estimated variance will be found in Q3 (see Question 3).
## (Intercept)       index 
##  -192.98383    15.29637

The slope and intercept for the regression line are 15.29637 and -192.98383 respectively.

Question 2: What is the value of R^2?

summary(model)$r.square
## [1] 0.1584636

The value of R-square is for this regression is 0.1584636.

Question 3: Test for the signifiance of the regression at a 0.05 level of signficance assuming the response is Normally distributed, what is your conclusion?

H0:β1=0

H1:β1≠0

summary(model)
## 
## Call:
## lm(formula = days ~ index)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -41.70 -21.54   2.12  18.56  36.42 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) -192.984    163.503  -1.180    0.258
## index         15.296      9.421   1.624    0.127
## 
## Residual standard error: 23.79 on 14 degrees of freedom
## Multiple R-squared:  0.1585, Adjusted R-squared:  0.09835 
## F-statistic: 2.636 on 1 and 14 DF,  p-value: 0.1267

For the statistical summary of the model we can see that the p-value for the slope (0.127) is larger than 0.05. Therefore, we fail to reject the null hypothesis.

#Manual claculations:
Sx2<- sum(index*index) #sum of squared index
Sx1<- (sum(index)*sum(index))/length(index) #squared sum of index over number of observation 
Sxx<- Sx2-Sx1 #Sxx
v<- model$residuals #Getting the residuals to estimate MS(res)
variance<- sum(v*v)/(length(index)-2)#the estimated Yi variance :: MS(res) equal to [SS(res)/n-2], we lost 2 df(B0 and B1)
se_B1<- sqrt(variance/Sxx) #se of B1 this is identical to the model results (see model summary)
se_B1
## [1] 9.420975
t<- 15.296/se_B1 # t test value , compare with  t table value = 2.1448 , fail to reject Ho, alternate is H1: B1=0
t
## [1] 1.623611
t2<- sqrt(2.636) #This is prove that square of t value is equal to F test (see F test value in model summary) in simple linear regression
t2
## [1] 1.623576

conclusion : this mean there is no linear relationship between x and y or wwe can say that the observations available on x can’t explain the variablilty in y, if this is the case the alternate would be the mean of y.

Question 4: Regardless of whether you conclude that the regression is signficant above, make a scatterplot of the data showing the fitted regression line, confidence interval, and prediction interval

newindex <- seq(15,20,0.1)
conf <- predict(model,data.frame(index=newindex),interval = "confidence")
pred <- predict(model,data.frame(index=newindex),interval = "prediction")
plot(index,days)
abline(model)
lines(newindex,conf[,2])
lines(newindex,conf[,3])
lines(newindex,pred[,2])
lines(newindex,pred[,3])

Here, we can see the 95% confidence interval. However, the prediction interval falls outside the graph limits.

Question 4: Calculate a 95% confidence interval on the mean number of days the ozone level exceeds 20ppm when the meteorological index is 17.0. Comment on the meaning of this interval?

conf1 <- predict(model,data.frame(index=17),interval = "confidence")
conf1
##        fit      lwr      upr
## 1 67.05437 52.52748 81.58127

The predicted value of days is 67.05437, with a lower bound of 52.52748 and an upper bound of 81.58127. This means that based on these analysis, we are 95% confident that the number of days that number of days the ozone level exceeds 20ppm when the meteorological index is 17.0, is somewhere between 52.52748 and 81.58127 days.

Question 5: Calculate a 95% prediction interval on the mean number of days the ozone level exceeds 20ppm when the meterological index is 17.0. Comment on the meaning of this intervall? Compare the width of the prediction interval to that of the confidence interval and comment.

pred1 <- predict(model,data.frame(index=17),interval = "prediction")
pred1
##        fit      lwr      upr
## 1 67.05437 13.99203 120.1167

Based on these results our model can predict (with 95% confidence) that the number of days the ozone level exceeds 20ppm when the meteorological index is 17.0 is between 13.99203 and 120.1167. As expected the width of our prediction interval is larger than our confidence interval.