Hossein Bahreinizad
Abdallah Alrawashdeh
Ahsan Masud Bappi
date: ‘2023-01-31’ output: html_document —
days <- c(91, 105, 106, 108, 88, 91, 58, 82, 81, 65, 61, 48, 61 ,43, 33, 36)
index <- c(16.7, 17.1, 18.2, 18.1, 17.2, 18.2, 16, 17.2, 18, 17.2, 16.9, 17.1, 18.2, 17.3, 17.5, 16.6)
model <- lm(days~index) #getting simple linear regression model
plot(index,days) #plotting the data provided
abline(model) #show the fitted line on the plot
coef(model) #returning the linear combinations of Yi which are the unbiased estimator of the true B1 and B0,The estimated variance will be found in Q3 (see Question 3).
## (Intercept) index
## -192.98383 15.29637
The slope and intercept for the regression line are 15.29637 and -192.98383 respectively.
summary(model)$r.square
## [1] 0.1584636
The value of R-square is for this regression is 0.1584636.
H0:β1=0
H1:β1≠0
summary(model)
##
## Call:
## lm(formula = days ~ index)
##
## Residuals:
## Min 1Q Median 3Q Max
## -41.70 -21.54 2.12 18.56 36.42
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -192.984 163.503 -1.180 0.258
## index 15.296 9.421 1.624 0.127
##
## Residual standard error: 23.79 on 14 degrees of freedom
## Multiple R-squared: 0.1585, Adjusted R-squared: 0.09835
## F-statistic: 2.636 on 1 and 14 DF, p-value: 0.1267
For the statistical summary of the model we can see that the p-value for the slope (0.127) is larger than 0.05. Therefore, we fail to reject the null hypothesis.
#Manual claculations:
Sx2<- sum(index*index) #sum of squared index
Sx1<- (sum(index)*sum(index))/length(index) #squared sum of index over number of observation
Sxx<- Sx2-Sx1 #Sxx
v<- model$residuals #Getting the residuals to estimate MS(res)
variance<- sum(v*v)/(length(index)-2)#the estimated Yi variance :: MS(res) equal to [SS(res)/n-2], we lost 2 df(B0 and B1)
se_B1<- sqrt(variance/Sxx) #se of B1 this is identical to the model results (see model summary)
se_B1
## [1] 9.420975
t<- 15.296/se_B1 # t test value , compare with t table value = 2.1448 , fail to reject Ho, alternate is H1: B1=0
t
## [1] 1.623611
t2<- sqrt(2.636) #This is prove that square of t value is equal to F test (see F test value in model summary) in simple linear regression
t2
## [1] 1.623576
conclusion : this mean there is no linear relationship between x and y or wwe can say that the observations available on x can’t explain the variablilty in y, if this is the case the alternate would be the mean of y.
newindex <- seq(15,20,0.1)
conf <- predict(model,data.frame(index=newindex),interval = "confidence")
pred <- predict(model,data.frame(index=newindex),interval = "prediction")
plot(index,days)
abline(model)
lines(newindex,conf[,2])
lines(newindex,conf[,3])
lines(newindex,pred[,2])
lines(newindex,pred[,3])
Here, we can see the 95% confidence interval. However, the prediction interval falls outside the graph limits.
conf1 <- predict(model,data.frame(index=17),interval = "confidence")
conf1
## fit lwr upr
## 1 67.05437 52.52748 81.58127
The predicted value of days is 67.05437, with a lower bound of 52.52748 and an upper bound of 81.58127. This means that based on these analysis, we are 95% confident that the number of days that number of days the ozone level exceeds 20ppm when the meteorological index is 17.0, is somewhere between 52.52748 and 81.58127 days.
pred1 <- predict(model,data.frame(index=17),interval = "prediction")
pred1
## fit lwr upr
## 1 67.05437 13.99203 120.1167
Based on these results our model can predict (with 95% confidence) that the number of days the ozone level exceeds 20ppm when the meteorological index is 17.0 is between 13.99203 and 120.1167. As expected the width of our prediction interval is larger than our confidence interval.