Lab Assignment Week 7

For each question, provide your code and the answer.

Q1:

Estimate a regression model where Reading performance on the PISA (PISARead) is regressed on gross national income (GNI) and gross domestic product (GDP). Is there multicollinearity between the independent variables?

PISA <- read.csv("/Users/YanfeiQin/Desktop/Fall 2021/897-002 Applied Linear Modeling/Lab 7/PISA.csv", header=TRUE, sep=",")
PISA2 <- na.omit(PISA[,c("PISARead","GNI","GDP")])
lm <- lm(PISARead ~ GNI + GDP, data = PISA2)
library(car)

## Loading required package: carData

vif(lm)

##      GNI      GDP 
## 4159.901 4159.901

As shown above in the Variance Inflation Factor, VIF = 4159.901. The VIF is way bigger than the standard of VIF>= 10, suggesting that our model suffer from multicollinearity problems. Also, by using the method of Tolerance. The tolerance for GNI & GDP equal 1 / 4159.901 = 0.00024. According to the standard, a value less than or equal to 0.10 suggests that the independent variables in your model may suffer from multicollinearity problems. Thus, the tolerance method shows our model does suffer from multicollinearity problems.

cor(PISA2[,c( "GNI", "GDP")], use="complete.obs")

##           GNI       GDP
## GNI 1.0000000 0.9998798
## GDP 0.9998798 1.0000000

As shown above in the Correlation Matrix, the correlation between our two IVs is 0.999, which again confirm our conclusion: the IVs do suffer from multicollinearity problems.

Lab Assignment Week 7

Yanfei Qin

11/3/2021

Answer the following questions using the PISA dataset and the PISA Codebook.

For each question, provide your code and the answer.

Q1: