# Import data
setwd("C:/Users/Qiu J/Desktop/MSSP+DA 2021FALL/MSSP 897-002 Applied Linear Modeling/Assignment/Lab Assignment 7")
PISA <- read.csv("C:/Users/Qiu J/Desktop/MSSP+DA 2021FALL/MSSP 897-002 Applied Linear Modeling/Assignment/Lab Assignment 7/PISA.csv")
Sys.setenv(language="en")
# Regression model
lm <- lm(PISARead ~ GNI + GDP, data=PISA)
summary(lm)
Call:
lm(formula = PISARead ~ GNI + GDP, data = PISA)
Residuals:
Min 1Q Median 3Q Max
-64.781 -12.074 5.929 16.787 44.816
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.950e+02 6.083e+00 81.381 <2e-16 ***
GNI 2.014e-04 9.587e-05 2.100 0.0430 *
GDP -2.063e-04 9.833e-05 -2.098 0.0431 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 28.27 on 35 degrees of freedom
Multiple R-squared: 0.1122, Adjusted R-squared: 0.06147
F-statistic: 2.212 on 2 and 35 DF, p-value: 0.1246
# Test for multicollinearity using Variance Inflation Factor (VIF)
library(car)
vif(lm)
GNI GDP
4159.901 4159.901
The VIF is extremely high at 4159.9, indicating a severe multicollinearity problem.
# Test for multicollinearity using correlation matrix
cor(PISA[,c("GNI","GDP")],use="complete.obs")
GNI GDP
GNI 1.0000000 0.9998798
GDP 0.9998798 1.0000000
The correlation matrix indicates a high correlation between independent variables. So that there is multicollinearity between the independent variables.