--> Here we are going to import data into R
dat<-read.csv("C:\\Users\\18067\\Documents\\Fareeha Imam\\TTU R11767331\\Spring 2023\\SDA\\Assignment 12\\data-table-B8(6).csv")
head(dat)
##   x1  x2    y
## 1  0  10  7.5
## 2  0  50 15.0
## 3  0  85 22.0
## 4  0 110 28.6
## 5  0 140 31.6
## 6  0 170 34.0

1 Part A:

Fit a first order model and check the VIF for the fitted regression parameters

--> Fitting first order model, in which y is regressed on x1 and x2 using the data given in .csv file
model <- lm(y~x1+x2,data=dat)
summary(model)
## 
## Call:
## lm(formula = y ~ x1 + x2, data = dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.7716 -4.1656  0.0802  3.8323  8.3349 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.109e+01  1.669e+00   6.642 1.48e-07 ***
## x1          3.501e+02  3.968e+01   8.823 3.38e-10 ***
## x2          1.089e-01  9.983e-03  10.912 1.74e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.782 on 33 degrees of freedom
## Multiple R-squared:  0.8415, Adjusted R-squared:  0.8319 
## F-statistic:  87.6 on 2 and 33 DF,  p-value: 6.316e-14
--> we found that, the multiple R-squared is 0.8415, which means that x1 and x2 explain 84.15% of the variability in y. -->
Now, we are checking the VIF for the fitted regression parameters which is a measure of collinearity b/w predictor variables
library(car)
vif(model)
##       x1       x2 
## 1.016535 1.016535
--> The result shows that both x1 and x2 have VIF values close to one, indicating that there is no significant collinearity between the predictor variables.

2 Part B:

Fit a first order model with second order interactions and check the VIF for the fitted regression parameters


Again initiating a regression model as model1 with interaction term between x1 and x2.
model1 <- lm(y~x1+x2+x1:x2,data=dat)
summary(model1)
## 
## Call:
## lm(formula = y ~ x1 + x2 + x1:x2, data = dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.0753 -3.6781  0.4395  3.1321  8.8448 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  12.50128    1.89347   6.602 1.92e-07 ***
## x1          256.73740   73.72914   3.482  0.00146 ** 
## x2            0.09879    0.01193   8.281 1.84e-09 ***
## x1:x2         0.76127    0.51026   1.492  0.14551    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.696 on 32 degrees of freedom
## Multiple R-squared:  0.8518, Adjusted R-squared:  0.8379 
## F-statistic: 61.31 on 3 and 32 DF,  p-value: 2.318e-13

The result shows that the interaction term x1:x2 is not significant.

We use the VIF() function to calculate the variance inflation factor values for x1 and x2 in model1.
vif(model1)
##       x1       x2    x1:x2 
## 3.639435 1.505416 3.822936
vif(model1, type= 'predictor')
##    GVIF Df GVIF^(1/(2*Df)) Interacts With Other Predictors
## x1    1  3               1             x2             --  
## x2    1  3               1             x1             --

The result shows that the genralized variance inflation factor indicates the multicollinearity is not inflated, the regression coefficient estimates for x1 and x2 can be considered reliable..

3 Part C:

Fit a first order model with standardized predictor variables and check the VIF for the fitted regression parameters


To make the data standardized we are using scale() function. The standardized data will be named as dat1.
dat1 <- scale(dat,center=TRUE,scale=TRUE)
dat1 <- as.data.frame(dat1)

Now, we initiate new model as model2 and fit this model2 to the standardized data
model2 <- lm(y~x1+x2,data=dat1)
summary(model2)
## 
## Call:
## lm(formula = y ~ x1 + x2, data = dat1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.83775 -0.35713  0.00688  0.32855  0.71458 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3.737e-17  6.833e-02   0.000        1    
## x1          6.165e-01  6.987e-02   8.823 3.38e-10 ***
## x2          7.625e-01  6.987e-02  10.912 1.74e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.41 on 33 degrees of freedom
## Multiple R-squared:  0.8415, Adjusted R-squared:  0.8319 
## F-statistic:  87.6 on 2 and 33 DF,  p-value: 6.316e-14

We got the larger F-statistic: 87.6 and very small p-value. According to results model2 is significant.

We use the VIF() function to calculate the variance inflation factor values for x1 and x2 in model2.
vif(model2)
##       x1       x2 
## 1.016535 1.016535

Both x1 and x2 have VIF values of 1.016535, indicating that model2 has multicollinearity between these two variables.

4 Part D:

Fit a first order model with second order interactions with standardized predictor variables and check the VIF for the fitted regression parameters


Now, we initiate new model as model3 and fit this model3 to the standardized data which is "dat1" in this code
model3 <- lm(y~x1+x2+x1:x2,data=dat1)
summary(model3)
## 
## Call:
## lm(formula = y ~ x1 + x2 + x1:x2, data = dat1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.60658 -0.31533  0.03768  0.26852  0.75829 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.01357    0.06771   0.200    0.842    
## x1           0.61207    0.06868   8.912 3.51e-10 ***
## x2           0.78767    0.07066  11.147 1.49e-12 ***
## x1:x2        0.10943    0.07335   1.492    0.146    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4026 on 32 degrees of freedom
## Multiple R-squared:  0.8518, Adjusted R-squared:  0.8379 
## F-statistic: 61.31 on 3 and 32 DF,  p-value: 2.318e-13

The result shows that the residual standard error has decreased and R-squared multiple of 0.8518, indicating that it explains 85.18% of the variation in the response variable.

Now, we are checking the VIF for the fitted regression parameters which is a measure of collinearity b/w predictor variables
vif(model3)
##       x1       x2    x1:x2 
## 1.018439 1.078223 1.066356
vif(model3, type= 'predictor')
##    GVIF Df GVIF^(1/(2*Df)) Interacts With Other Predictors
## x1    1  3               1             x2             --  
## x2    1  3               1             x1             --

We found that VIF values are close to 1

5 Part E:

Comment on VIFs for parts a,b,c,d in the context on standardized and non-standardized variables


We have four different models.

x1 and x2 are predictor variables in the first model. The VIFs for both variables are close to one, indicating that the model has no evidence of multicollinearity.

The VIF for the interaction term (x1:x2) in the second model is greater than 3, indicating that there is evidence of multicollinearity in the model.

Before fitting the regression models in the third and fourth models, the data is standardized (centered and scaled).

There is no evidence of multicollinearity in the third model.

The fourth model, which includes standardized predictors, yields similar results to the third model.

Conclusion:

The lower VIF values for the individual predictor variables in the standardized data models indicate that standardizing the data has reduced the degree of correlation between the predictor variables in the model. The interaction term contributed to multicollinearity in the non-standardized data model, as indicated by the high VIF value for the interaction term.