dat<-read.csv("C:\\Users\\18067\\Documents\\Fareeha Imam\\TTU R11767331\\Spring 2023\\SDA\\Assignment 11\\data-table-B21(2).csv")
head(dat)
## i y x_1 x_2 x_3 x_4
## 1 1 78.5 7 26 6 60
## 2 2 74.3 1 29 15 52
## 3 3 104.3 11 56 8 20
## 4 4 87.6 11 31 8 47
## 5 5 95.9 7 52 6 33
## 6 6 109.2 11 55 9 22
dat<-dat[,-1]
colnames(dat)<-c("y","x1","x2","x3","x4")
print(dat,row.names=FALSE)
## y x1 x2 x3 x4
## 78.5 7 26 6 60
## 74.3 1 29 15 52
## 104.3 11 56 8 20
## 87.6 11 31 8 47
## 95.9 7 52 6 33
## 109.2 11 55 9 22
## 102.7 3 71 17 6
## 72.5 1 31 22 44
## 93.1 2 54 18 22
## 115.9 21 47 4 26
## 83.8 1 40 23 34
## 113.3 11 66 9 12
## 109.4 10 68 8 12
When you regress y on all four predictors, what do you notice about the p-value for the f-statistic and the t-tests for the individual regression coefficients?
--> To Regress y on all four predictors, we can use lm() function as follows:model<-lm(y~x1+x2+x3+x4, data=dat)
--> Checking the significance of the coefficient by using Anova() and summary()
anova(model)
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## x1 1 1450.08 1450.08 242.3679 2.888e-07 ***
## x2 1 1207.78 1207.78 201.8705 5.863e-07 ***
## x3 1 9.79 9.79 1.6370 0.2366
## x4 1 0.25 0.25 0.0413 0.8441
## Residuals 8 47.86 5.98
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(model)
##
## Call:
## lm(formula = y ~ x1 + x2 + x3 + x4, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.1750 -1.6709 0.2508 1.3783 3.9254
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 62.4054 70.0710 0.891 0.3991
## x1 1.5511 0.7448 2.083 0.0708 .
## x2 0.5102 0.7238 0.705 0.5009
## x3 0.1019 0.7547 0.135 0.8959
## x4 -0.1441 0.7091 -0.203 0.8441
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.446 on 8 degrees of freedom
## Multiple R-squared: 0.9824, Adjusted R-squared: 0.9736
## F-statistic: 111.5 on 4 and 8 DF, p-value: 4.756e-07
--> The anova(model) results and summary output indicate that the overall model is statistically significant (p-value < 0.05) with an F-statistic of 111.5 and a very small p-value of 4.756e-07. Looking at the individual regression coefficients, we can see that x1 and x2 are significant predictors (p-value < 0.05) with t-values of 2.083 and 0.705, respectively. But not for x3 and x4. This suggests that x3 and x4 may not be important predictors for the response variable.
What are the VIFs for the predictors in this model?
library(car)
vif(model)
## x1 x2 x3 x4
## 38.49621 254.42317 46.86839 282.51286
--> The high VIF values for x2 and x4 indicate that these predictors having high collinearity between them.
Which first order model do you think describes the response with interpretable regression parameters the “best”, why?
--> We are going to build several different models using different combinations of the predictor variablesmodel1<-lm(y~x1+x3 , data=dat)
summary(model1)
##
## Call:
## lm(formula = y ~ x1 + x3, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.142 -7.779 2.558 7.226 15.008
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 72.3490 17.0528 4.243 0.00171 **
## x1 2.3125 0.9598 2.409 0.03672 *
## x3 0.4945 0.8814 0.561 0.58717
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.08 on 10 degrees of freedom
## Multiple R-squared: 0.5482, Adjusted R-squared: 0.4578
## F-statistic: 6.066 on 2 and 10 DF, p-value: 0.01883
--> The intercept is significant (p-value = 0.00171) model2<-lm(y~x2+x3, data=dat)
summary(model2)
##
## Call:
## lm(formula = y ~ x2 + x3, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.1535 -4.1565 -0.3155 2.0330 13.4864
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 72.0747 7.3834 9.762 1.98e-06 ***
## x2 0.7313 0.1207 6.057 0.000123 ***
## x3 -1.0084 0.2934 -3.437 0.006358 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.445 on 10 degrees of freedom
## Multiple R-squared: 0.847, Adjusted R-squared: 0.8164
## F-statistic: 27.69 on 2 and 10 DF, p-value: 8.377e-05
--> The intercept is significant (p-value = 1.98e-06) model3<-lm(y~x1+x2+x3, data=dat)
summary(model3)
##
## Call:
## lm(formula = y ~ x1 + x2 + x3, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.2543 -1.4726 0.1755 1.5409 3.9711
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 48.19363 3.91330 12.315 6.17e-07 ***
## x1 1.69589 0.20458 8.290 1.66e-05 ***
## x2 0.65691 0.04423 14.851 1.23e-07 ***
## x3 0.25002 0.18471 1.354 0.209
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.312 on 9 degrees of freedom
## Multiple R-squared: 0.9823, Adjusted R-squared: 0.9764
## F-statistic: 166.3 on 3 and 9 DF, p-value: 3.367e-08
--> The intercept is significant (p-value = 6.17e-07) model4<-lm(y~x1+x3+x4, data=dat)
summary(model4)
##
## Call:
## lm(formula = y ~ x1 + x3 + x4, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.9323 -1.8090 0.4806 1.1398 3.7771
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 111.68441 4.56248 24.479 1.52e-09 ***
## x1 1.05185 0.22368 4.702 0.00112 **
## x3 -0.41004 0.19923 -2.058 0.06969 .
## x4 -0.64280 0.04454 -14.431 1.58e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.377 on 9 degrees of freedom
## Multiple R-squared: 0.9813, Adjusted R-squared: 0.975
## F-statistic: 157.3 on 3 and 9 DF, p-value: 4.312e-08
--> The intercept is significant (p-value = 0.000675)