SS154 6.1

Identify the typical signs of multicollinearity in a regression.

The The easiest way for the detection of multicollinearity is to examine the correlation between each pair of explanatory variables. If two of the variables are highly correlated, then this may the possible source of multicollinearity. However, pair-wise correlation between the explanatory variables may be considered as the sufficient, but not the necessary condition for the multicollinearity. The second easy way for detecting the multicollinearity is to estimate the multiple regression and then examine the output carefully. The rule of thumb to doubt about the presence of multicollinearity is very high \(R^2\) but most of the coefficients are not significant according to their p-values. However, this cannot be considered as an acid test for detecting multicollinearity. It will provide an apparent idea for the presence of multicollinearity11.

Explain what might give rise to multicollinearity and provide real-life examples of such situations.

High or perfect correlation between regressors. For example if we want to check how hieght might explain high grades and we also use income - hieght across the globe is correlated with income.

Correct for multicollinearity in a regression.

# install packages this script uses if you dont have them already
# install.packages("ggplot2")
#install.packages("readxl")
#install.packages("GGally")
#install.packages("corpcor")
#install.packages("mctest")
#install.packages("ppcor")
# Clear workspace
rm(list=ls()) 
# Call packages you use
library(ggplot2)
library(readxl)
library(GGally)
library(corpcor)
library(mctest)
library(ppcor)

Loading required package: MASS

# Choose the path and name of the file "6_1_data"
wagesmicrodata <- read.csv(file="/Users/oba2311/Desktop/Minerva/Junior/SS154/6/6-1-data.csv")
# To view the dataframe
# View(wagesmicrodata)
# The database is attached to the R search path. 
# This means that the database is searched by R when evaluating a variable, so objects in the database can be accessed by simply giving their names.
attach(wagesmicrodata)

# Assuming no multicollinearity, the model is being estimated using the following codes:
fit1 <- lm(log(WAGE)~OCCUPATION+SECTOR+UNION+EDUCATION+EXPERIENCE+AGE+SEX+MARRSTAT+RACE+SOUTH)
# To get model summary
summary(fit1)


Call:
lm(formula = log(WAGE) ~ OCCUPATION + SECTOR + UNION + EDUCATION + 
    EXPERIENCE + AGE + SEX + MARRSTAT + RACE + SOUTH)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.16246 -0.29163 -0.00469  0.29981  1.98248 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.078596   0.687514   1.569 0.117291    
OCCUPATION  -0.007417   0.013109  -0.566 0.571761    
SECTOR       0.091458   0.038736   2.361 0.018589 *  
UNION        0.200483   0.052475   3.821 0.000149 ***
EDUCATION    0.179366   0.110756   1.619 0.105949    
EXPERIENCE   0.095822   0.110799   0.865 0.387531    
AGE         -0.085444   0.110730  -0.772 0.440671    
SEX         -0.221997   0.039907  -5.563 4.24e-08 ***
MARRSTAT     0.076611   0.041931   1.827 0.068259 .  
RACE         0.050406   0.028531   1.767 0.077865 .  
SOUTH       -0.102360   0.042823  -2.390 0.017187 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4398 on 523 degrees of freedom
Multiple R-squared:  0.3185,    Adjusted R-squared:  0.3054 
F-statistic: 24.44 on 10 and 523 DF,  p-value: < 2.2e-16

# Make plots of residuals
par(mfrow=c(2,2))
plot(fit1)

not plotting observations with leverage one:
  444

# Create X with explanatory variables
X<-wagesmicrodata[,2:11]
# pair-wise correlation of explanatory variables
ggpairs(X)


 plot: [1,1] [=---------------------------------------------------------------------]  1% est: 0s 
 plot: [1,2] [=---------------------------------------------------------------------]  2% est: 6s 
 plot: [1,3] [==--------------------------------------------------------------------]  3% est: 7s 
 plot: [1,4] [===-------------------------------------------------------------------]  4% est: 7s 
 plot: [1,5] [====------------------------------------------------------------------]  5% est: 7s 
 plot: [1,6] [====------------------------------------------------------------------]  6% est: 7s 
 plot: [1,7] [=====-----------------------------------------------------------------]  7% est: 7s 
 plot: [1,8] [======----------------------------------------------------------------]  8% est: 7s 
 plot: [1,9] [======----------------------------------------------------------------]  9% est: 7s 
 plot: [1,10] [=======--------------------------------------------------------------] 10% est: 7s 
 plot: [2,1] [========--------------------------------------------------------------] 11% est: 7s 
 plot: [2,2] [========--------------------------------------------------------------] 12% est: 7s 
 plot: [2,3] [=========-------------------------------------------------------------] 13% est: 7s 
 plot: [2,4] [==========------------------------------------------------------------] 14% est: 7s 
 plot: [2,5] [==========------------------------------------------------------------] 15% est: 7s 
 plot: [2,6] [===========-----------------------------------------------------------] 16% est: 7s 
 plot: [2,7] [============----------------------------------------------------------] 17% est: 7s 
 plot: [2,8] [=============---------------------------------------------------------] 18% est: 7s 
 plot: [2,9] [=============---------------------------------------------------------] 19% est: 6s 
 plot: [2,10] [==============-------------------------------------------------------] 20% est: 6s 
 plot: [3,1] [===============-------------------------------------------------------] 21% est: 6s 
 plot: [3,2] [===============-------------------------------------------------------] 22% est: 6s 
 plot: [3,3] [================------------------------------------------------------] 23% est: 6s 
 plot: [3,4] [=================-----------------------------------------------------] 24% est: 6s 
 plot: [3,5] [==================----------------------------------------------------] 25% est: 6s 
 plot: [3,6] [==================----------------------------------------------------] 26% est: 6s 
 plot: [3,7] [===================---------------------------------------------------] 27% est: 6s 
 plot: [3,8] [====================--------------------------------------------------] 28% est: 6s 
 plot: [3,9] [====================--------------------------------------------------] 29% est: 6s 
 plot: [3,10] [=====================------------------------------------------------] 30% est: 6s 
 plot: [4,1] [======================------------------------------------------------] 31% est: 6s 
 plot: [4,2] [======================------------------------------------------------] 32% est: 6s 
 plot: [4,3] [=======================-----------------------------------------------] 33% est: 5s 
 plot: [4,4] [========================----------------------------------------------] 34% est: 5s 
 plot: [4,5] [========================----------------------------------------------] 35% est: 5s 
 plot: [4,6] [=========================---------------------------------------------] 36% est: 5s 
 plot: [4,7] [==========================--------------------------------------------] 37% est: 5s 
 plot: [4,8] [===========================-------------------------------------------] 38% est: 5s 
 plot: [4,9] [===========================-------------------------------------------] 39% est: 5s 
 plot: [4,10] [============================-----------------------------------------] 40% est: 5s 
 plot: [5,1] [=============================-----------------------------------------] 41% est: 5s 
 plot: [5,2] [=============================-----------------------------------------] 42% est: 5s 
 plot: [5,3] [==============================----------------------------------------] 43% est: 5s 
 plot: [5,4] [===============================---------------------------------------] 44% est: 5s 
 plot: [5,5] [================================--------------------------------------] 45% est: 5s 
 plot: [5,6] [================================--------------------------------------] 46% est: 4s 
 plot: [5,7] [=================================-------------------------------------] 47% est: 4s 
 plot: [5,8] [==================================------------------------------------] 48% est: 4s 
 plot: [5,9] [==================================------------------------------------] 49% est: 4s 
 plot: [5,10] [==================================-----------------------------------] 50% est: 4s 
 plot: [6,1] [====================================----------------------------------] 51% est: 4s 
 plot: [6,2] [====================================----------------------------------] 52% est: 4s 
 plot: [6,3] [=====================================---------------------------------] 53% est: 4s 
 plot: [6,4] [======================================--------------------------------] 54% est: 4s 
 plot: [6,5] [======================================--------------------------------] 55% est: 4s 
 plot: [6,6] [=======================================-------------------------------] 56% est: 4s 
 plot: [6,7] [========================================------------------------------] 57% est: 4s 
 plot: [6,8] [=========================================-----------------------------] 58% est: 3s 
 plot: [6,9] [=========================================-----------------------------] 59% est: 3s 
 plot: [6,10] [=========================================----------------------------] 60% est: 3s 
 plot: [7,1] [===========================================---------------------------] 61% est: 3s 
 plot: [7,2] [===========================================---------------------------] 62% est: 3s 
 plot: [7,3] [============================================--------------------------] 63% est: 3s 
 plot: [7,4] [=============================================-------------------------] 64% est: 3s 
 plot: [7,5] [==============================================------------------------] 65% est: 3s 
 plot: [7,6] [==============================================------------------------] 66% est: 3s 
 plot: [7,7] [===============================================-----------------------] 67% est: 3s 
 plot: [7,8] [================================================----------------------] 68% est: 3s 
 plot: [7,9] [================================================----------------------] 69% est: 3s 
 plot: [7,10] [================================================---------------------] 70% est: 2s 
 plot: [8,1] [==================================================--------------------] 71% est: 2s 
 plot: [8,2] [==================================================--------------------] 72% est: 2s 
 plot: [8,3] [===================================================-------------------] 73% est: 2s 
 plot: [8,4] [====================================================------------------] 74% est: 2s 
 plot: [8,5] [====================================================------------------] 75% est: 2s 
 plot: [8,6] [=====================================================-----------------] 76% est: 2s 
 plot: [8,7] [======================================================----------------] 77% est: 2s 
 plot: [8,8] [=======================================================---------------] 78% est: 2s 
 plot: [8,9] [=======================================================---------------] 79% est: 2s 
 plot: [8,10] [=======================================================--------------] 80% est: 2s 
 plot: [9,1] [=========================================================-------------] 81% est: 2s 
 plot: [9,2] [=========================================================-------------] 82% est: 1s 
 plot: [9,3] [==========================================================------------] 83% est: 1s 
 plot: [9,4] [===========================================================-----------] 84% est: 1s 
 plot: [9,5] [============================================================----------] 85% est: 1s 
 plot: [9,6] [============================================================----------] 86% est: 1s 
 plot: [9,7] [=============================================================---------] 87% est: 1s 
 plot: [9,8] [==============================================================--------] 88% est: 1s 
 plot: [9,9] [==============================================================--------] 89% est: 1s 
 plot: [9,10] [==============================================================-------] 90% est: 1s 
 plot: [10,1] [===============================================================------] 91% est: 1s 
 plot: [10,2] [===============================================================------] 92% est: 1s 
 plot: [10,3] [================================================================-----] 93% est: 1s 
 plot: [10,4] [=================================================================----] 94% est: 0s 
 plot: [10,5] [==================================================================---] 95% est: 0s 
 plot: [10,6] [==================================================================---] 96% est: 0s 
 plot: [10,7] [===================================================================--] 97% est: 0s 
 plot: [10,8] [====================================================================-] 98% est: 0s 
 plot: [10,9] [====================================================================-] 99% est: 0s 
 plot: [10,10] [====================================================================]100% est: 0s

# partial correlation coefficient matrix
cor2pcor(cov(X))

              [,1]         [,2]         [,3]        [,4]         [,5]        [,6]         [,7]
 [1,]  1.000000000 -0.031750193  0.051510483 -0.99756187 -0.007479144  0.99726160  0.017230877
 [2,] -0.031750193  1.000000000 -0.030152499 -0.02231360 -0.097548621  0.02152507 -0.111197596
 [3,]  0.051510483 -0.030152499  1.000000000  0.05497703 -0.120087577 -0.05369785  0.020017315
 [4,] -0.997561873 -0.022313605  0.054977034  1.00000000 -0.010244447  0.99987574  0.010888486
 [5,] -0.007479144 -0.097548621 -0.120087577 -0.01024445  1.000000000  0.01223890 -0.107706183
 [6,]  0.997261601  0.021525073 -0.053697851  0.99987574  0.012238897  1.00000000 -0.010803310
 [7,]  0.017230877 -0.111197596  0.020017315  0.01088849 -0.107706183 -0.01080331  1.000000000
 [8,]  0.029436911  0.008430595 -0.142750864  0.04205856  0.212996388 -0.04414029  0.057539374
 [9,] -0.021253493 -0.021518760 -0.112146760 -0.01326166 -0.013531482  0.01456575  0.006412099
[10,] -0.040302967  0.030418218  0.004163264 -0.04097664  0.068918496  0.04509033  0.055645964
              [,8]         [,9]        [,10]
 [1,]  0.029436911 -0.021253493 -0.040302967
 [2,]  0.008430595 -0.021518760  0.030418218
 [3,] -0.142750864 -0.112146760  0.004163264
 [4,]  0.042058560 -0.013261665 -0.040976643
 [5,]  0.212996388 -0.013531482  0.068918496
 [6,] -0.044140293  0.014565751  0.045090327
 [7,]  0.057539374  0.006412099  0.055645964
 [8,]  1.000000000  0.314746868 -0.018580965
 [9,]  0.314746868  1.000000000  0.036495494
[10,] -0.018580965  0.036495494  1.000000000

# Individual Multicollinearity Diagnostics Result
imcdiag(X,WAGE)


Call:
imcdiag(x = X, y = WAGE)


All Individual Multicollinearity Diagnostics Result

                 VIF    TOL          Wi          Fi Leamer      CVIF Klein
EDUCATION   231.1956 0.0043  13402.4982  15106.5849 0.0658  236.4725     1
SOUTH         1.0468 0.9553      2.7264      3.0731 0.9774    1.0707     0
SEX           1.0916 0.9161      5.3351      6.0135 0.9571    1.1165     0
EXPERIENCE 5184.0939 0.0002 301771.2445 340140.5368 0.0139 5302.4188     1
UNION         1.1209 0.8922      7.0368      7.9315 0.9445    1.1464     0
AGE        4645.6650 0.0002 270422.7164 304806.1391 0.0147 4751.7005     1
RACE          1.0371 0.9642      2.1622      2.4372 0.9819    1.0608     0
OCCUPATION    1.2982 0.7703     17.3637     19.5715 0.8777    1.3279     0
SECTOR        1.1987 0.8343     11.5670     13.0378 0.9134    1.2260     0
MARRSTAT      1.0961 0.9123      5.5969      6.3085 0.9551    1.1211     0

1 --> COLLINEARITY is detected by the test 
0 --> COLLINEARITY is not detected by the test

EDUCATION , SOUTH , EXPERIENCE , AGE , RACE , OCCUPATION , SECTOR , MARRSTAT , coefficient(s) are non-significant may be due to multicollinearity

R-square of y on all x: 0.2805 

* use method argument to check which regressors may be the reason of collinearity
===================================

# t-test for correlation coefficient
pcor(X, method = "pearson")

$estimate
              EDUCATION        SOUTH          SEX  EXPERIENCE        UNION         AGE
EDUCATION   1.000000000 -0.031750193  0.051510483 -0.99756187 -0.007479144  0.99726160
SOUTH      -0.031750193  1.000000000 -0.030152499 -0.02231360 -0.097548621  0.02152507
SEX         0.051510483 -0.030152499  1.000000000  0.05497703 -0.120087577 -0.05369785
EXPERIENCE -0.997561873 -0.022313605  0.054977034  1.00000000 -0.010244447  0.99987574
UNION      -0.007479144 -0.097548621 -0.120087577 -0.01024445  1.000000000  0.01223890
AGE         0.997261601  0.021525073 -0.053697851  0.99987574  0.012238897  1.00000000
RACE        0.017230877 -0.111197596  0.020017315  0.01088849 -0.107706183 -0.01080331
OCCUPATION  0.029436911  0.008430595 -0.142750864  0.04205856  0.212996388 -0.04414029
SECTOR     -0.021253493 -0.021518760 -0.112146760 -0.01326166 -0.013531482  0.01456575
MARRSTAT   -0.040302967  0.030418218  0.004163264 -0.04097664  0.068918496  0.04509033
                   RACE   OCCUPATION       SECTOR     MARRSTAT
EDUCATION   0.017230877  0.029436911 -0.021253493 -0.040302967
SOUTH      -0.111197596  0.008430595 -0.021518760  0.030418218
SEX         0.020017315 -0.142750864 -0.112146760  0.004163264
EXPERIENCE  0.010888486  0.042058560 -0.013261665 -0.040976643
UNION      -0.107706183  0.212996388 -0.013531482  0.068918496
AGE        -0.010803310 -0.044140293  0.014565751  0.045090327
RACE        1.000000000  0.057539374  0.006412099  0.055645964
OCCUPATION  0.057539374  1.000000000  0.314746868 -0.018580965
SECTOR      0.006412099  0.314746868  1.000000000  0.036495494
MARRSTAT    0.055645964 -0.018580965  0.036495494  1.000000000

$p.value
           EDUCATION      SOUTH         SEX EXPERIENCE        UNION       AGE       RACE
EDUCATION  0.0000000 0.46745162 0.238259049  0.0000000 8.641246e-01 0.0000000 0.69337880
SOUTH      0.4674516 0.00000000 0.490162786  0.6096300 2.526916e-02 0.6223281 0.01070652
SEX        0.2382590 0.49016279 0.000000000  0.2080904 5.822656e-03 0.2188841 0.64692038
EXPERIENCE 0.0000000 0.60962999 0.208090393  0.0000000 8.146741e-01 0.0000000 0.80325456
UNION      0.8641246 0.02526916 0.005822656  0.8146741 0.000000e+00 0.7794483 0.01345383
AGE        0.0000000 0.62232811 0.218884070  0.0000000 7.794483e-01 0.0000000 0.80476248
RACE       0.6933788 0.01070652 0.646920379  0.8032546 1.345383e-02 0.8047625 0.00000000
OCCUPATION 0.5005235 0.84704000 0.001027137  0.3356824 8.220095e-07 0.3122902 0.18763758
SECTOR     0.6267278 0.62243025 0.010051378  0.7615531 7.568528e-01 0.7389200 0.88336002
MARRSTAT   0.3562616 0.48634504 0.924111163  0.3482728 1.143954e-01 0.3019796 0.20260170
             OCCUPATION       SECTOR  MARRSTAT
EDUCATION  5.005235e-01 6.267278e-01 0.3562616
SOUTH      8.470400e-01 6.224302e-01 0.4863450
SEX        1.027137e-03 1.005138e-02 0.9241112
EXPERIENCE 3.356824e-01 7.615531e-01 0.3482728
UNION      8.220095e-07 7.568528e-01 0.1143954
AGE        3.122902e-01 7.389200e-01 0.3019796
RACE       1.876376e-01 8.833600e-01 0.2026017
OCCUPATION 0.000000e+00 1.467261e-13 0.6707116
SECTOR     1.467261e-13 0.000000e+00 0.4035489
MARRSTAT   6.707116e-01 4.035489e-01 0.0000000

$statistic
              EDUCATION      SOUTH         SEX   EXPERIENCE      UNION          AGE       RACE
EDUCATION     0.0000000 -0.7271618  1.18069629 -327.2105031 -0.1712102  308.6803174  0.3944914
SOUTH        -0.7271618  0.0000000 -0.69053623   -0.5109090 -2.2436907    0.4928456 -2.5613138
SEX           1.1806963 -0.6905362  0.00000000    1.2603880 -2.7689685   -1.2309760  0.4583091
EXPERIENCE -327.2105031 -0.5109090  1.26038801    0.0000000 -0.2345184 1451.9092015  0.2492636
UNION        -0.1712102 -2.2436907 -2.76896848   -0.2345184  0.0000000    0.2801822 -2.4799336
AGE         308.6803174  0.4928456 -1.23097601 1451.9092015  0.2801822    0.0000000 -0.2473135
RACE          0.3944914 -2.5613138  0.45830912    0.2492636 -2.4799336   -0.2473135  0.0000000
OCCUPATION    0.6741338  0.1929920 -3.30152873    0.9636171  4.9902208   -1.0114033  1.3193223
SECTOR       -0.4866246 -0.4927010 -2.58345399   -0.3036001 -0.3097781    0.3334607  0.1467827
MARRSTAT     -0.9233273  0.6966272  0.09530228   -0.9387867  1.5813765    1.0332156  1.2757711
           OCCUPATION     SECTOR    MARRSTAT
EDUCATION   0.6741338 -0.4866246 -0.92332727
SOUTH       0.1929920 -0.4927010  0.69662719
SEX        -3.3015287 -2.5834540  0.09530228
EXPERIENCE  0.9636171 -0.3036001 -0.93878671
UNION       4.9902208 -0.3097781  1.58137652
AGE        -1.0114033  0.3334607  1.03321563
RACE        1.3193223  0.1467827  1.27577106
OCCUPATION  0.0000000  7.5906763 -0.42541117
SECTOR      7.5906763  0.0000000  0.83597695
MARRSTAT   -0.4254112  0.8359769  0.00000000

$n
[1] 534

$gp
[1] 8

$method
[1] "pearson"

# dropping "EXPERIENCE" and fitting a new model
fit2<- lm(log(WAGE)~OCCUPATION+SECTOR+UNION+EDUCATION+AGE+SEX+MARRSTAT+RACE+SOUTH)
summary(fit2)


Call:
lm(formula = log(WAGE) ~ OCCUPATION + SECTOR + UNION + EDUCATION + 
    AGE + SEX + MARRSTAT + RACE + SOUTH)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.16018 -0.29085 -0.00513  0.29985  1.97932 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.501358   0.164794   3.042 0.002465 ** 
OCCUPATION  -0.006941   0.013095  -0.530 0.596309    
SECTOR       0.091013   0.038723   2.350 0.019125 *  
UNION        0.200018   0.052459   3.813 0.000154 ***
EDUCATION    0.083815   0.007728  10.846  < 2e-16 ***
AGE          0.010305   0.001745   5.905 6.34e-09 ***
SEX         -0.220100   0.039837  -5.525 5.20e-08 ***
MARRSTAT     0.075125   0.041886   1.794 0.073458 .  
RACE         0.050674   0.028523   1.777 0.076210 .  
SOUTH       -0.103186   0.042802  -2.411 0.016261 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4397 on 524 degrees of freedom
Multiple R-squared:  0.3175,    Adjusted R-squared:  0.3058 
F-statistic: 27.09 on 9 and 524 DF,  p-value: < 2.2e-16

Please bring a small paragraph open in a tab on your browser where you explain what method for identifying multicollinearity did you find the most useful and why.

I liked the work flow of the blog starting with visual inspection of pairwise correlations, then going into depth with the full FG workflow of the three tests. I find the chi-square useful. I would choose to go through the 3 tests starting with visual inspection.

Simulate 1,000 data points in R from each of the following four models and save the 1,000 data points (Y_i ,X_i ,W_i ) for each model. Then use the simulated data to estimate (b0,b1,b2) for each model.

\(Y_i = b0 + b1\cdot X_i + b2 \cdot W_i + e_i\)

Y<- runif(1000)

Where \((b0, b1, b2)\) = \((1, 2, 3)\) That would just be bad since the coefficients are randomly chosen.

Where \(X_i \~ N(1,2)\)

norm_x <- rnorm(1000,1,2)
norm_w <- rnorm(1000,3,2)
model1 <- lm(Y~norm_x+norm_w)
summary(model1)


Call:
lm(formula = Y ~ norm_x + norm_w)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.50429 -0.25404  0.00223  0.25235  0.51092 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.5137616  0.0172251  29.826   <2e-16 ***
norm_x      -0.0008997  0.0046467  -0.194    0.847    
norm_w      -0.0055364  0.0045966  -1.204    0.229    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2954 on 997 degrees of freedom
Multiple R-squared:  0.001486,  Adjusted R-squared:  -0.0005166 
F-statistic: 0.7421 on 2 and 997 DF,  p-value: 0.4764

Where \(e_i ~ N(0,1)\)

norm_e <-rnorm(1000,0,1)

Where \(W_i\) comes from: Let \(W_i = 2\cdot X_i\). What happens when you try to estimate the model? (Perfectly collinear so matrix not invertible so it will give error)

norm_w<- 2*norm_x
model2 <- lm(Y~norm_x+norm_w)
summary(model2)


Call:
lm(formula = Y ~ norm_x + norm_w)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.49677 -0.25782  0.00199  0.25514  0.50677 

Coefficients: (1 not defined because of singularities)
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.4974106  0.0106048  46.904   <2e-16 ***
norm_x      -0.0008503  0.0046476  -0.183    0.855    
norm_w              NA         NA      NA       NA    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2955 on 998 degrees of freedom
Multiple R-squared:  3.354e-05, Adjusted R-squared:  -0.0009684 
F-statistic: 0.03347 on 1 and 998 DF,  p-value: 0.8549

Let \(W_i = log(10 + X_i)\) What happens when you try to estimate the model? (Will estimate it but will see the estimates of \(b1\) and \(b2\) will be bad (different from \((b1,b2)=(1,2)\))

norm_w = log(10+norm_x)
model3 <- lm(Y~norm_x+norm_w)
summary(model3)


Call:
lm(formula = Y ~ norm_x + norm_w)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.50536 -0.25742  0.00025  0.25593  0.50581 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  1.06498    0.88008   1.210    0.227
norm_x       0.02213    0.03593   0.616    0.538
norm_w      -0.24807    0.38463  -0.645    0.519

Residual standard error: 0.2955 on 997 degrees of freedom
Multiple R-squared:  0.0004506, Adjusted R-squared:  -0.001555 
F-statistic: 0.2247 on 2 and 997 DF,  p-value: 0.7988

Let \(W_i = X_i + u_i\), where \(u_i ~ N(1, 0.1)\). (Will estimate it but will see the estimates of b1 and b2 will be bad (different from (b1,b2)=(1,2))

u_i <- rnorm(1000,1,0.1)
norm_w = norm_x+ u_i
model4 <- lm(Y~norm_x+norm_w)
summary(model4)


Call:
lm(formula = Y ~ norm_x + norm_w)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.49725 -0.25819  0.00227  0.25665  0.50467 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.46289    0.09604   4.820 1.66e-06 ***
norm_x      -0.03532    0.09542  -0.370    0.711    
norm_w       0.03450    0.09538   0.362    0.718    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2956 on 997 degrees of freedom
Multiple R-squared:  0.0001647, Adjusted R-squared:  -0.001841 
F-statistic: 0.08213 on 2 and 997 DF,  p-value: 0.9212

Let \(W_i ~ N(-1,2)\). What happens when you try to estimate the model? (Will get good estimates as X_i and W_i are independent)

norm_w <- rnorm(1000,-1,2)
model5 <- lm(Y~norm_x+norm_w)
summary(model5)


Call:
lm(formula = Y ~ norm_x + norm_w)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.49651 -0.25762  0.00172  0.25580  0.51046 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.4959764  0.0114634  43.266   <2e-16 ***
norm_x      -0.0008748  0.0046503  -0.188    0.851    
norm_w      -0.0015144  0.0045839  -0.330    0.741    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2956 on 997 degrees of freedom
Multiple R-squared:  0.000143,  Adjusted R-squared:  -0.001863 
F-statistic: 0.07129 on 2 and 997 DF,  p-value: 0.9312

LS0tCnRpdGxlOiAiU1MxNTQgNi4xIgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgoqKklkZW50aWZ5IHRoZSB0eXBpY2FsIHNpZ25zIG9mIG11bHRpY29sbGluZWFyaXR5IGluIGEgcmVncmVzc2lvbi4qKgoKX1RoZSBUaGUgZWFzaWVzdCB3YXkgZm9yIHRoZSBkZXRlY3Rpb24gb2YgbXVsdGljb2xsaW5lYXJpdHkgaXMgdG8gZXhhbWluZSB0aGUgY29ycmVsYXRpb24gYmV0d2VlbiBlYWNoIHBhaXIgb2YgZXhwbGFuYXRvcnkgdmFyaWFibGVzLiBJZiB0d28gb2YgdGhlIHZhcmlhYmxlcyBhcmUgaGlnaGx5IGNvcnJlbGF0ZWQsIHRoZW4gdGhpcyBtYXkgdGhlIHBvc3NpYmxlIHNvdXJjZSBvZiBtdWx0aWNvbGxpbmVhcml0eS4gSG93ZXZlciwgcGFpci13aXNlIGNvcnJlbGF0aW9uIGJldHdlZW4gdGhlIGV4cGxhbmF0b3J5IHZhcmlhYmxlcyBtYXkgYmUgY29uc2lkZXJlZCBhcyB0aGUgc3VmZmljaWVudCwgYnV0IG5vdCB0aGUgbmVjZXNzYXJ5IGNvbmRpdGlvbiBmb3IgdGhlIG11bHRpY29sbGluZWFyaXR5Ll8KX1RoZSBzZWNvbmQgZWFzeSB3YXkgZm9yIGRldGVjdGluZyB0aGUgbXVsdGljb2xsaW5lYXJpdHkgaXMgdG8gZXN0aW1hdGUgdGhlIG11bHRpcGxlIHJlZ3Jlc3Npb24gYW5kIHRoZW4gZXhhbWluZSB0aGUgb3V0cHV0IGNhcmVmdWxseS4gVGhlIHJ1bGUgb2YgdGh1bWIgdG8gZG91YnQgYWJvdXQgdGhlIHByZXNlbmNlIG9mIG11bHRpY29sbGluZWFyaXR5IGlzIHZlcnkgaGlnaCAkUl4yJCBidXQgbW9zdCBvZiB0aGUgY29lZmZpY2llbnRzIGFyZSBub3Qgc2lnbmlmaWNhbnQgYWNjb3JkaW5nIHRvIHRoZWlyIHAtdmFsdWVzLiBIb3dldmVyLCB0aGlzIGNhbm5vdCBiZSBjb25zaWRlcmVkIGFzIGFuIGFjaWQgdGVzdCBmb3IgZGV0ZWN0aW5nIG11bHRpY29sbGluZWFyaXR5LiBJdCB3aWxsIHByb3ZpZGUgYW4gYXBwYXJlbnQgaWRlYSBmb3IgdGhlIHByZXNlbmNlIG9mIG11bHRpY29sbGluZWFyaXR5MTEuXwoKCgoKKipFeHBsYWluIHdoYXQgbWlnaHQgZ2l2ZSByaXNlIHRvIG11bHRpY29sbGluZWFyaXR5IGFuZCBwcm92aWRlIHJlYWwtbGlmZSBleGFtcGxlcyBvZiBzdWNoIHNpdHVhdGlvbnMuKioKCkhpZ2ggb3IgcGVyZmVjdCBjb3JyZWxhdGlvbiBiZXR3ZWVuIHJlZ3Jlc3NvcnMuIEZvciBleGFtcGxlIGlmIHdlIHdhbnQgdG8gY2hlY2sgaG93IGhpZWdodCBtaWdodCBleHBsYWluIGhpZ2ggZ3JhZGVzIGFuZCB3ZSBhbHNvIHVzZSBpbmNvbWUgLSBoaWVnaHQgYWNyb3NzIHRoZSBnbG9iZSBpcyBjb3JyZWxhdGVkIHdpdGggaW5jb21lLgoKKipDb3JyZWN0IGZvciBtdWx0aWNvbGxpbmVhcml0eSBpbiBhIHJlZ3Jlc3Npb24uKioKCgpfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fCgpgYGB7cn0KIyBpbnN0YWxsIHBhY2thZ2VzIHRoaXMgc2NyaXB0IHVzZXMgaWYgeW91IGRvbnQgaGF2ZSB0aGVtIGFscmVhZHkKIyBpbnN0YWxsLnBhY2thZ2VzKCJnZ3Bsb3QyIikKI2luc3RhbGwucGFja2FnZXMoInJlYWR4bCIpCiNpbnN0YWxsLnBhY2thZ2VzKCJHR2FsbHkiKQojaW5zdGFsbC5wYWNrYWdlcygiY29ycGNvciIpCiNpbnN0YWxsLnBhY2thZ2VzKCJtY3Rlc3QiKQojaW5zdGFsbC5wYWNrYWdlcygicHBjb3IiKQoKIyBDbGVhciB3b3Jrc3BhY2UKcm0obGlzdD1scygpKSAKCiMgQ2FsbCBwYWNrYWdlcyB5b3UgdXNlCmxpYnJhcnkoZ2dwbG90MikKbGlicmFyeShyZWFkeGwpCmxpYnJhcnkoR0dhbGx5KQpsaWJyYXJ5KGNvcnBjb3IpCmxpYnJhcnkobWN0ZXN0KQpsaWJyYXJ5KHBwY29yKQoKIyBDaG9vc2UgdGhlIHBhdGggYW5kIG5hbWUgb2YgdGhlIGZpbGUgIjZfMV9kYXRhIgp3YWdlc21pY3JvZGF0YSA8LSByZWFkLmNzdihmaWxlPSIvVXNlcnMvb2JhMjMxMS9EZXNrdG9wL01pbmVydmEvSnVuaW9yL1NTMTU0LzYvNi0xLWRhdGEuY3N2IikKCiMgVG8gdmlldyB0aGUgZGF0YWZyYW1lCiMgVmlldyh3YWdlc21pY3JvZGF0YSkKCiMgVGhlIGRhdGFiYXNlIGlzIGF0dGFjaGVkIHRvIHRoZSBSIHNlYXJjaCBwYXRoLiAKIyBUaGlzIG1lYW5zIHRoYXQgdGhlIGRhdGFiYXNlIGlzIHNlYXJjaGVkIGJ5IFIgd2hlbiBldmFsdWF0aW5nIGEgdmFyaWFibGUsIHNvIG9iamVjdHMgaW4gdGhlIGRhdGFiYXNlIGNhbiBiZSBhY2Nlc3NlZCBieSBzaW1wbHkgZ2l2aW5nIHRoZWlyIG5hbWVzLgphdHRhY2god2FnZXNtaWNyb2RhdGEpCmBgYAoKYGBge3J9CgojIEFzc3VtaW5nIG5vIG11bHRpY29sbGluZWFyaXR5LCB0aGUgbW9kZWwgaXMgYmVpbmcgZXN0aW1hdGVkIHVzaW5nIHRoZSBmb2xsb3dpbmcgY29kZXM6CmZpdDEgPC0gbG0obG9nKFdBR0Upfk9DQ1VQQVRJT04rU0VDVE9SK1VOSU9OK0VEVUNBVElPTitFWFBFUklFTkNFK0FHRStTRVgrTUFSUlNUQVQrUkFDRStTT1VUSCkKCiMgVG8gZ2V0IG1vZGVsIHN1bW1hcnkKc3VtbWFyeShmaXQxKQpgYGAKCmBgYHtyfQojIE1ha2UgcGxvdHMgb2YgcmVzaWR1YWxzCnBhcihtZnJvdz1jKDIsMikpCnBsb3QoZml0MSkKYGBgCgoKYGBge3J9CiMgQ3JlYXRlIFggd2l0aCBleHBsYW5hdG9yeSB2YXJpYWJsZXMKWDwtd2FnZXNtaWNyb2RhdGFbLDI6MTFdCgojIHBhaXItd2lzZSBjb3JyZWxhdGlvbiBvZiBleHBsYW5hdG9yeSB2YXJpYWJsZXMKZ2dwYWlycyhYKQpgYGAKCmBgYHtyfQojIHBhcnRpYWwgY29ycmVsYXRpb24gY29lZmZpY2llbnQgbWF0cml4CmNvcjJwY29yKGNvdihYKSkKYGBgCgpgYGB7cn0KIyBJbmRpdmlkdWFsIE11bHRpY29sbGluZWFyaXR5IERpYWdub3N0aWNzIFJlc3VsdAppbWNkaWFnKFgsV0FHRSkKYGBgCgpgYGB7cn0KIyB0LXRlc3QgZm9yIGNvcnJlbGF0aW9uIGNvZWZmaWNpZW50CnBjb3IoWCwgbWV0aG9kID0gInBlYXJzb24iKQpgYGAKCgpgYGB7cn0KIyBkcm9wcGluZyAiRVhQRVJJRU5DRSIgYW5kIGZpdHRpbmcgYSBuZXcgbW9kZWwKZml0MjwtIGxtKGxvZyhXQUdFKX5PQ0NVUEFUSU9OK1NFQ1RPUitVTklPTitFRFVDQVRJT04rQUdFK1NFWCtNQVJSU1RBVCtSQUNFK1NPVVRIKQpzdW1tYXJ5KGZpdDIpCmBgYAoKCioqUGxlYXNlIGJyaW5nIGEgc21hbGwgcGFyYWdyYXBoIG9wZW4gaW4gYSB0YWIgb24geW91ciBicm93c2VyIHdoZXJlIHlvdSBleHBsYWluIHdoYXQgbWV0aG9kIGZvciBpZGVudGlmeWluZyBtdWx0aWNvbGxpbmVhcml0eSBkaWQgeW91IGZpbmQgdGhlIG1vc3QgdXNlZnVsIGFuZCB3aHkuKioKCkkgbGlrZWQgdGhlIHdvcmsgZmxvdyBvZiB0aGUgYmxvZyBzdGFydGluZyB3aXRoIHZpc3VhbCBpbnNwZWN0aW9uIG9mIHBhaXJ3aXNlIGNvcnJlbGF0aW9ucywgdGhlbiBnb2luZyBpbnRvIGRlcHRoIHdpdGggdGhlIGZ1bGwgRkcgd29ya2Zsb3cgb2YgdGhlIHRocmVlIHRlc3RzLiBJIGZpbmQgdGhlIGNoaS1zcXVhcmUgdXNlZnVsLiBJIHdvdWxkIGNob29zZSB0byBnbyB0aHJvdWdoIHRoZSAzIHRlc3RzIHN0YXJ0aW5nIHdpdGggdmlzdWFsIGluc3BlY3Rpb24uCgoKKipTaW11bGF0ZSAxLDAwMCBkYXRhIHBvaW50cyBpbiBSIGZyb20gZWFjaCBvZiB0aGUgZm9sbG93aW5nIGZvdXIgbW9kZWxzIGFuZCBzYXZlIHRoZSAxLDAwMCBkYXRhIHBvaW50cyAoWV9pICxYX2kgLFdfaSApIGZvciBlYWNoIG1vZGVsLiBUaGVuIHVzZSB0aGUgc2ltdWxhdGVkIGRhdGEgdG8gZXN0aW1hdGUgKGIwLGIxLGIyKSBmb3IgZWFjaCBtb2RlbC4qKgoKJFlfaSA9IGIwICsgYjFcY2RvdCBYX2kgKyBiMiBcY2RvdCBXX2kgKyBlX2kkCmBgYHtyfQpZPC0gcnVuaWYoMTAwMCkKYGBgCgpXaGVyZSAkKGIwLCBiMSwgYjIpJCA9ICQoMSwgMiwgMykkClRoYXQgd291bGQganVzdCBiZSBiYWQgc2luY2UgdGhlIGNvZWZmaWNpZW50cyBhcmUgcmFuZG9tbHkgY2hvc2VuLgoKV2hlcmUgJFhfaSBcfiBOKDEsMikkCmBgYHtyfQpub3JtX3ggPC0gcm5vcm0oMTAwMCwxLDIpCm5vcm1fdyA8LSBybm9ybSgxMDAwLDMsMikKbW9kZWwxIDwtIGxtKFl+bm9ybV94K25vcm1fdykKc3VtbWFyeShtb2RlbDEpCmBgYAoKV2hlcmUgJGVfaSB+ICBOKDAsMSkkCmBgYHtyfQpub3JtX2UgPC1ybm9ybSgxMDAwLDAsMSkKYGBgCgpXaGVyZSAkV19pJCBjb21lcyBmcm9tOgpMZXQgJFdfaSA9IDJcY2RvdCBYX2kkLiBXaGF0IGhhcHBlbnMgd2hlbiB5b3UgdHJ5IHRvIGVzdGltYXRlIHRoZSBtb2RlbD8gKFBlcmZlY3RseSBjb2xsaW5lYXIgc28gbWF0cml4IG5vdCBpbnZlcnRpYmxlIHNvIGl0IHdpbGwgZ2l2ZSBlcnJvcikKYGBge3J9Cm5vcm1fdzwtIDIqbm9ybV94Cm1vZGVsMiA8LSBsbShZfm5vcm1feCtub3JtX3cpCnN1bW1hcnkobW9kZWwyKQpgYGAKCjIuCUxldCAkV19pID0gbG9nKDEwICsgWF9pKSQgV2hhdCBoYXBwZW5zIHdoZW4geW91IHRyeSB0byBlc3RpbWF0ZSB0aGUgbW9kZWw/IChXaWxsIGVzdGltYXRlIGl0IGJ1dCB3aWxsIHNlZSB0aGUgZXN0aW1hdGVzIG9mICRiMSQgYW5kICRiMiQgd2lsbCBiZSBiYWQgKGRpZmZlcmVudCBmcm9tICQoYjEsYjIpPSgxLDIpJCkKYGBge3J9Cm5vcm1fdyA9IGxvZygxMCtub3JtX3gpCm1vZGVsMyA8LSBsbShZfm5vcm1feCtub3JtX3cpCnN1bW1hcnkobW9kZWwzKQpgYGAKCjMuCUxldCAkV19pID0gWF9pICsgdV9pJCwgd2hlcmUgJHVfaSB+IE4oMSwgMC4xKSQuIChXaWxsIGVzdGltYXRlIGl0IGJ1dCB3aWxsIHNlZSB0aGUgZXN0aW1hdGVzIG9mIGIxIGFuZCBiMiB3aWxsIGJlIGJhZCAoZGlmZmVyZW50IGZyb20gKGIxLGIyKT0oMSwyKSkKYGBge3J9CnVfaSA8LSBybm9ybSgxMDAwLDEsMC4xKQpub3JtX3cgPSBub3JtX3grIHVfaQptb2RlbDQgPC0gbG0oWX5ub3JtX3grbm9ybV93KQpzdW1tYXJ5KG1vZGVsNCkKYGBgCgo0LglMZXQgJFdfaSB+IE4oLTEsMikkLiBXaGF0IGhhcHBlbnMgd2hlbiB5b3UgdHJ5IHRvIGVzdGltYXRlIHRoZSBtb2RlbD8gKFdpbGwgZ2V0IGdvb2QgZXN0aW1hdGVzIGFzIFhfaSBhbmQgV19pIGFyZSBpbmRlcGVuZGVudCkKYGBge3J9Cm5vcm1fdyA8LSBybm9ybSgxMDAwLC0xLDIpCm1vZGVsNSA8LSBsbShZfm5vcm1feCtub3JtX3cpCnN1bW1hcnkobW9kZWw1KQpgYGAKCg==