Introduction

The text is motivated by Marcin Rutecki and Manimala, both of the Kaggle project.

Multicollinearity - definition

Multicollinearity means the situation, where the explanatory variables are highly pairwise correlated.

Consequences of Multicollinearity (according to Rutecki)

Multicollinearity could result in significant problems during model fitting. It can reduce the overall performance of regression and classification models:

The result is that the coefficient estimates are unstable and difficult to interpret. Multicollinearity saps the statistical power of the analysis, can cause the coefficients to switch signs, and makes it more difficult to specify the correct model.

Remedies to Multicollinearity problem (according to Rutecki)

  1. Increase the sample size. Enlarging the sample will introduce more variation in the data series, which reduces the effect of sampling error and helps increase precision when estimating various properties of the data. Increased sample sizes can reduce either the presence or the impact of multicollinearity, or both.

  2. Make some transformation of the data (for example, replace level data by their differences)

  3. Remove some of the highly correlated features.

  1. Use regularization methods such as RIDGE and LASSO or Bayesian regression. (I’ll skip it in this study)

Boston House Prices

Each record in the database describes a Boston suburb or town. The data was drawn from the Boston Standard Metropolitan Statistical Area (SMSA) in 1970. The attributes are defined as follows (taken from the UCI Machine Learning Repository1): CRIM: per capita crime rate by town

We can see that the input attributes have a mixture of units.

Computations

rm(list=ls())
library(lmtest)
library(car)
library(glmnet)
column_names <- c("CRIM", "ZN", "INDUS", "CHAS", "NOX", "RM", "AGE", "DIS", "RAD", "TAX", "PTRATIO", "B", "LSTAT", "MEDV")
df <- read.table("adot/housing.data", header = FALSE, sep = "", col.names = column_names)
### Plotting the correlation between various columns
panel.cor <- function(x, y, digits = 2, prefix = "", cex.cor, ...)
{
    par(usr = c(0, 1, 0, 1))
    r <- abs(cor(x, y))
    txt <- format(c(r, 0.123456789), digits = digits)[1]
    txt <- paste0(prefix, txt)
    if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
    text(0.5, 0.5, txt, cex = cex.cor * r)
}
pairs(df, lower.panel = panel.smooth, upper.panel = panel.cor,
      gap=0, row1attop=FALSE)

Overparametrized model

model_all <- lm(MEDV ~ ., data=df)  # with all the independent variables in the dataframe

summary(model_all)

Call:
lm(formula = MEDV ~ ., data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-15.595  -2.730  -0.518   1.777  26.199 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.646e+01  5.103e+00   7.144 3.28e-12 ***
CRIM        -1.080e-01  3.286e-02  -3.287 0.001087 ** 
ZN           4.642e-02  1.373e-02   3.382 0.000778 ***
INDUS        2.056e-02  6.150e-02   0.334 0.738288    
CHAS         2.687e+00  8.616e-01   3.118 0.001925 ** 
NOX         -1.777e+01  3.820e+00  -4.651 4.25e-06 ***
RM           3.810e+00  4.179e-01   9.116  < 2e-16 ***
AGE          6.922e-04  1.321e-02   0.052 0.958229    
DIS         -1.476e+00  1.995e-01  -7.398 6.01e-13 ***
RAD          3.060e-01  6.635e-02   4.613 5.07e-06 ***
TAX         -1.233e-02  3.760e-03  -3.280 0.001112 ** 
PTRATIO     -9.527e-01  1.308e-01  -7.283 1.31e-12 ***
B            9.312e-03  2.686e-03   3.467 0.000573 ***
LSTAT       -5.248e-01  5.072e-02 -10.347  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.745 on 492 degrees of freedom
Multiple R-squared:  0.7406,    Adjusted R-squared:  0.7338 
F-statistic: 108.1 on 13 and 492 DF,  p-value: < 2.2e-16
print(paste("AIC = ", AIC(model_all)))
[1] "AIC =  3027.60859407555"
resettest(model_all)

    RESET test

data:  model_all
RESET = 97.853, df1 = 2, df2 = 490, p-value < 2.2e-16
model <<- lm(MEDV ~ +1 +CRIM  +  ZN + INDUS + CHAS + NOX + RM + AGE + DIS  + RAD + TAX + PTRATIO + B  +   LSTAT    )
summary(model)

Call:
lm(formula = MEDV ~ +1 + CRIM + ZN + INDUS + CHAS + NOX + RM + 
    AGE + DIS + RAD + TAX + PTRATIO + B + LSTAT)

Residuals:
    Min      1Q  Median      3Q     Max 
-15.595  -2.730  -0.518   1.777  26.199 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.646e+01  5.103e+00   7.144 3.28e-12 ***
CRIM        -1.080e-01  3.286e-02  -3.287 0.001087 ** 
ZN           4.642e-02  1.373e-02   3.382 0.000778 ***
INDUS        2.056e-02  6.150e-02   0.334 0.738288    
CHAS         2.687e+00  8.616e-01   3.118 0.001925 ** 
NOX         -1.777e+01  3.820e+00  -4.651 4.25e-06 ***
RM           3.810e+00  4.179e-01   9.116  < 2e-16 ***
AGE          6.922e-04  1.321e-02   0.052 0.958229    
DIS         -1.476e+00  1.995e-01  -7.398 6.01e-13 ***
RAD          3.060e-01  6.635e-02   4.613 5.07e-06 ***
TAX         -1.233e-02  3.760e-03  -3.280 0.001112 ** 
PTRATIO     -9.527e-01  1.308e-01  -7.283 1.31e-12 ***
B            9.312e-03  2.686e-03   3.467 0.000573 ***
LSTAT       -5.248e-01  5.072e-02 -10.347  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.745 on 492 degrees of freedom
Multiple R-squared:  0.7406,    Adjusted R-squared:  0.7338 
F-statistic: 108.1 on 13 and 492 DF,  p-value: < 2.2e-16
print(paste("AIC = ", AIC(model_all)))
[1] "AIC =  3027.60859407555"
print(paste("Ramsey Reset Test = ",resettest(model)))
[1] "Ramsey Reset Test =  c(RESET = 97.8531762010485)"
[2] "Ramsey Reset Test =  c(df1 = 2, df2 = 490)"      
[3] "Ramsey Reset Test =  RESET test"                 
[4] "Ramsey Reset Test =  1.7546369513621e-36"        
[5] "Ramsey Reset Test =  model"                      
model <<- lm(MEDV ~ +1 +CRIM  +  ZN + INDUS + CHAS + NOX + RM  + DIS  + RAD + TAX + PTRATIO + B  +   LSTAT    )
summary(model)

Call:
lm(formula = MEDV ~ +1 + CRIM + ZN + INDUS + CHAS + NOX + RM + 
    DIS + RAD + TAX + PTRATIO + B + LSTAT)

Residuals:
     Min       1Q   Median       3Q      Max 
-15.6054  -2.7313  -0.5188   1.7601  26.2243 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  36.436927   5.080119   7.172 2.72e-12 ***
CRIM         -0.108006   0.032832  -3.290 0.001075 ** 
ZN            0.046334   0.013613   3.404 0.000719 ***
INDUS         0.020562   0.061433   0.335 0.737989    
CHAS          2.689026   0.859598   3.128 0.001863 ** 
NOX         -17.713540   3.679308  -4.814 1.97e-06 ***
RM            3.814394   0.408480   9.338  < 2e-16 ***
DIS          -1.478612   0.190611  -7.757 5.03e-14 ***
RAD           0.305786   0.066089   4.627 4.75e-06 ***
TAX          -0.012329   0.003755  -3.283 0.001099 ** 
PTRATIO      -0.952211   0.130294  -7.308 1.10e-12 ***
B             0.009321   0.002678   3.481 0.000544 ***
LSTAT        -0.523852   0.047625 -10.999  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.74 on 493 degrees of freedom
Multiple R-squared:  0.7406,    Adjusted R-squared:  0.7343 
F-statistic: 117.3 on 12 and 493 DF,  p-value: < 2.2e-16
print(paste("AIC = ", AIC(model)))
[1] "AIC =  3025.61141822068"
print(paste("Ramsey Reset Test = ",resettest(model)))
[1] "Ramsey Reset Test =  c(RESET = 97.7759346558843)"
[2] "Ramsey Reset Test =  c(df1 = 2, df2 = 491)"      
[3] "Ramsey Reset Test =  RESET test"                 
[4] "Ramsey Reset Test =  1.80799788716649e-36"       
[5] "Ramsey Reset Test =  model"                      
model <<- lm(MEDV ~ +1 +CRIM  +  ZN + CHAS + NOX + RM + DIS  + RAD + TAX + PTRATIO + B  +   LSTAT    )
summary(model)

Call:
lm(formula = MEDV ~ +1 + CRIM + ZN + CHAS + NOX + RM + DIS + 
    RAD + TAX + PTRATIO + B + LSTAT)

Residuals:
     Min       1Q   Median       3Q      Max 
-15.5984  -2.7386  -0.5046   1.7273  26.2373 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  36.341145   5.067492   7.171 2.73e-12 ***
CRIM         -0.108413   0.032779  -3.307 0.001010 ** 
ZN            0.045845   0.013523   3.390 0.000754 ***
CHAS          2.718716   0.854240   3.183 0.001551 ** 
NOX         -17.376023   3.535243  -4.915 1.21e-06 ***
RM            3.801579   0.406316   9.356  < 2e-16 ***
DIS          -1.492711   0.185731  -8.037 6.84e-15 ***
RAD           0.299608   0.063402   4.726 3.00e-06 ***
TAX          -0.011778   0.003372  -3.493 0.000521 ***
PTRATIO      -0.946525   0.129066  -7.334 9.24e-13 ***
B             0.009291   0.002674   3.475 0.000557 ***
LSTAT        -0.522553   0.047424 -11.019  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.736 on 494 degrees of freedom
Multiple R-squared:  0.7406,    Adjusted R-squared:  0.7348 
F-statistic: 128.2 on 11 and 494 DF,  p-value: < 2.2e-16
print(paste("AIC = ", AIC(model)))
[1] "AIC =  3023.72638782506"
print(paste("Ramsey Reset Test = ",resettest(model)))
[1] "Ramsey Reset Test =  c(RESET = 95.8321519471369)"
[2] "Ramsey Reset Test =  c(df1 = 2, df2 = 492)"      
[3] "Ramsey Reset Test =  RESET test"                 
[4] "Ramsey Reset Test =  7.11312960358601e-36"       
[5] "Ramsey Reset Test =  model"                      
model <<- lm(I(log(MEDV)) ~ +1 +CRIM  +  ZN + CHAS + NOX + RM + DIS  + RAD + TAX + PTRATIO + B  +   LSTAT    )
summary(model)

Call:
lm(formula = I(log(MEDV)) ~ +1 + CRIM + ZN + CHAS + NOX + RM + 
    DIS + RAD + TAX + PTRATIO + B + LSTAT)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.73400 -0.09460 -0.01771  0.09782  0.86290 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.0836823  0.2030491  20.112  < 2e-16 ***
CRIM        -0.0103187  0.0013134  -7.856 2.49e-14 ***
ZN           0.0010874  0.0005418   2.007 0.045308 *  
CHAS         0.1051484  0.0342285   3.072 0.002244 ** 
NOX         -0.7217440  0.1416535  -5.095 4.97e-07 ***
RM           0.0906728  0.0162807   5.569 4.20e-08 ***
DIS         -0.0517059  0.0074420  -6.948 1.18e-11 ***
RAD          0.0134457  0.0025405   5.293 1.82e-07 ***
TAX         -0.0005579  0.0001351  -4.129 4.28e-05 ***
PTRATIO     -0.0374259  0.0051715  -7.237 1.77e-12 ***
B            0.0004127  0.0001071   3.852 0.000133 ***
LSTAT       -0.0286039  0.0019002 -15.053  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1898 on 494 degrees of freedom
Multiple R-squared:  0.7891,    Adjusted R-squared:  0.7844 
F-statistic: 168.1 on 11 and 494 DF,  p-value: < 2.2e-16
print(paste("AIC = ", AIC(model)))
[1] "AIC =  -232.032859586876"
print(paste("Ramsey Reset Test = ",resettest(model)))
[1] "Ramsey Reset Test =  c(RESET = 14.4764626320061)"
[2] "Ramsey Reset Test =  c(df1 = 2, df2 = 492)"      
[3] "Ramsey Reset Test =  RESET test"                 
[4] "Ramsey Reset Test =  7.78016750024865e-07"       
[5] "Ramsey Reset Test =  model"                      
model <<- lm(log(MEDV) ~ +1 +CRIM  +  ZN + CHAS + I(NOX^2) + NOX + RM + DIS  + RAD + TAX + PTRATIO + B  +   LSTAT    )
summary(model)

Call:
lm(formula = log(MEDV) ~ +1 + CRIM + ZN + CHAS + I(NOX^2) + NOX + 
    RM + DIS + RAD + TAX + PTRATIO + B + LSTAT)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.73285 -0.09675 -0.01589  0.09603  0.86221 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.9009807  0.3531997  11.045  < 2e-16 ***
CRIM        -0.0103083  0.0013143  -7.843 2.75e-14 ***
ZN           0.0011882  0.0005651   2.103 0.036009 *  
CHAS         0.1073962  0.0344333   3.119 0.001921 ** 
I(NOX^2)    -0.4926811  0.7790968  -0.632 0.527435    
NOX         -0.0771932  1.0290625  -0.075 0.940235    
RM           0.0893266  0.0164291   5.437 8.53e-08 ***
DIS         -0.0492330  0.0084110  -5.853 8.79e-09 ***
RAD          0.0133104  0.0025510   5.218 2.67e-07 ***
TAX         -0.0005623  0.0001354  -4.153 3.86e-05 ***
PTRATIO     -0.0381504  0.0053000  -7.198 2.29e-12 ***
B            0.0004087  0.0001074   3.805 0.000159 ***
LSTAT       -0.0286811  0.0019053 -15.053  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1899 on 493 degrees of freedom
Multiple R-squared:  0.7893,    Adjusted R-squared:  0.7842 
F-statistic: 153.9 on 12 and 493 DF,  p-value: < 2.2e-16
print(paste("AIC = ", AIC(model)))
[1] "AIC =  -230.443135964678"
print(paste("Ramsey Reset Test = ",resettest(model)))
[1] "Ramsey Reset Test =  c(RESET = 14.6768475258417)"
[2] "Ramsey Reset Test =  c(df1 = 2, df2 = 491)"      
[3] "Ramsey Reset Test =  RESET test"                 
[4] "Ramsey Reset Test =  6.44450398199586e-07"       
[5] "Ramsey Reset Test =  model"                      

Multicollinearity testing

VIF

\[VIF_i = \frac{1}{1-R^2_i}\] niekedy sa hovorí, že tento ukazovateľ pre danú premennú nesmie byť väčší, ako 5, niekedy sa hovorí o hranici 10.

vif(model_all)
    CRIM       ZN    INDUS     CHAS      NOX       RM      AGE      DIS      RAD 
1.792192 2.298758 3.991596 1.073995 4.393720 1.933744 3.100826 3.955945 7.484496 
     TAX  PTRATIO        B    LSTAT 
9.008554 1.799084 1.348521 2.941491 

Vizualizácia VIF

vif_values <- vif(model_all)           #create vector of VIF values

barplot(vif_values, main = "VIF Values", horiz = TRUE, col = "steelblue") #create horizontal bar chart to display each VIF value

abline(v = 5, lwd = 3, lty = 2)    #add vertical line at 5 as after 5 there is severe correlation

NA
NA
df_9_10 <- df[,-c(9,10)]
model_9_10 <- lm(MEDV ~., data=df_9_10)
summary(model_9_10)

Call:
lm(formula = MEDV ~ ., data = df_9_10)

Residuals:
     Min       1Q   Median       3Q      Max 
-15.7734  -2.7928  -0.6504   1.5477  27.9833 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  29.210710   4.893167   5.970 4.54e-09 ***
CRIM         -0.061435   0.030434  -2.019 0.044066 *  
ZN            0.041516   0.013551   3.064 0.002306 ** 
INDUS        -0.045831   0.055961  -0.819 0.413190    
CHAS          3.085016   0.871868   3.538 0.000441 ***
NOX         -14.594752   3.670109  -3.977 8.03e-05 ***
RM            4.134272   0.419341   9.859  < 2e-16 ***
AGE          -0.004295   0.013415  -0.320 0.748983    
DIS          -1.488851   0.203337  -7.322 9.98e-13 ***
PTRATIO      -0.811272   0.121647  -6.669 6.91e-11 ***
B             0.008205   0.002706   3.032 0.002558 ** 
LSTAT        -0.515812   0.051668  -9.983  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.839 on 494 degrees of freedom
Multiple R-squared:  0.7293,    Adjusted R-squared:  0.7232 
F-statistic:   121 on 11 and 494 DF,  p-value: < 2.2e-16
vif(model_9_10)
    CRIM       ZN    INDUS     CHAS      NOX       RM      AGE      DIS  PTRATIO 
1.478206 2.154483 3.179166 1.057805 3.901348 1.872532 3.075755 3.954443 1.496077 
       B    LSTAT 
1.316559 2.936487 

Is heteroskedasticity present in the new model?

df_9_10 <- df[,-c(9,10)]
model_9_10 <- lm(MEDV ~., data=df_9_10)
bptest(model_9_10)

    studentized Breusch-Pagan test

data:  model_9_10
BP = 50.806, df = 11, p-value = 4.481e-07

Yes, it is…

LS0tCnRpdGxlOiAiTXVsdGljb2xsaW5lYXJpdHkgYW5kIFNwZWNpZmljYXRpb24gRXJyb3IiCm91dHB1dDogaHRtbF9ub3RlYm9vawotLS0KIyMgSW50cm9kdWN0aW9uCgpUaGUgdGV4dCBpcyBtb3RpdmF0ZWQgYnkgW01hcmNpbiBSdXRlY2tpXShodHRwczovL3d3dy5rYWdnbGUuY29tL2NvZGUvbWFyY2lucnV0ZWNraS9tdWx0aWNvbGxpbmVhcml0eS1kZXRlY3Rpb24tYW5kLXJlbWVkaWVzKSBhbmQgW01hbmltYWxhXShodHRwczovL3d3dy5rYWdnbGUuY29tL2RhdGFzZXRzL3Zpa3Jpc2huYW4vYm9zdG9uLWhvdXNlLXByaWNlcyksIGJvdGggb2YgdGhlIFtLYWdnbGUgcHJvamVjdF0oaHR0cHM6Ly93d3cua2FnZ2xlLmNvbS8pLgoKIyMgTXVsdGljb2xsaW5lYXJpdHkgLSBkZWZpbml0aW9uCk11bHRpY29sbGluZWFyaXR5IG1lYW5zIHRoZSBzaXR1YXRpb24sIHdoZXJlIHRoZSBleHBsYW5hdG9yeSB2YXJpYWJsZXMgYXJlIGhpZ2hseSBwYWlyd2lzZSBjb3JyZWxhdGVkLgoKIyMgIENvbnNlcXVlbmNlcyBvZiBNdWx0aWNvbGxpbmVhcml0eSAoYWNjb3JkaW5nIHRvIFJ1dGVja2kpCgpNdWx0aWNvbGxpbmVhcml0eSBjb3VsZCByZXN1bHQgaW4gc2lnbmlmaWNhbnQgcHJvYmxlbXMgZHVyaW5nIG1vZGVsIGZpdHRpbmcuIEl0IGNhbiByZWR1Y2UgdGhlIG92ZXJhbGwgcGVyZm9ybWFuY2Ugb2YgcmVncmVzc2lvbiBhbmQgY2xhc3NpZmljYXRpb24gbW9kZWxzOgoKLSBkb2VzIG5vdCBpbmNyZWFzZSBiaWFzLCBidXQgaXQgY2FuIGluY3JlYXNlIHZhcmlhbmNlIChvdmVyZml0dGluZyk7Ci0gbWFrZSB0aGUgZXN0aW1hdGVzIHZlcnkgc2Vuc2l0aXZlIHRvIG1pbm9yIGNoYW5nZXMgaW4gdGhlIG1vZGVsOwotIGRvZXNuJ3QgYWZmZWN0IHRoZSBwcmVkaWN0aXZlIHBvd2VyLCBidXQgaW5kaXZpZHVhbCBwcmVkaWN0b3IgdmFyaWFibGUncyBpbXBhY3Qgb24gdGhlIHJlc3BvbnNlIHZhcmlhYmxlIGNvdWxkIGJlIGNhbGN1bGF0ZWQgd3JvbmdseS4KClRoZSByZXN1bHQgaXMgdGhhdCB0aGUgY29lZmZpY2llbnQgZXN0aW1hdGVzIGFyZSB1bnN0YWJsZSBhbmQgZGlmZmljdWx0IHRvIGludGVycHJldC4gTXVsdGljb2xsaW5lYXJpdHkgc2FwcyB0aGUgc3RhdGlzdGljYWwgcG93ZXIgb2YgdGhlIGFuYWx5c2lzLCBjYW4gY2F1c2UgdGhlIGNvZWZmaWNpZW50cyB0byBzd2l0Y2ggc2lnbnMsIGFuZCBtYWtlcyBpdCBtb3JlIGRpZmZpY3VsdCB0byBzcGVjaWZ5IHRoZSBjb3JyZWN0IG1vZGVsLgoKIyMgUmVtZWRpZXMgdG8gTXVsdGljb2xsaW5lYXJpdHkgcHJvYmxlbSAoYWNjb3JkaW5nIHRvIFJ1dGVja2kpCjEuIEluY3JlYXNlIHRoZSBzYW1wbGUgc2l6ZS4gRW5sYXJnaW5nIHRoZSBzYW1wbGUgd2lsbCBpbnRyb2R1Y2UgbW9yZSB2YXJpYXRpb24gaW4gdGhlIGRhdGEgc2VyaWVzLCB3aGljaCByZWR1Y2VzIHRoZSBlZmZlY3Qgb2Ygc2FtcGxpbmcgZXJyb3IgYW5kIGhlbHBzIGluY3JlYXNlIHByZWNpc2lvbiB3aGVuIGVzdGltYXRpbmcgdmFyaW91cyBwcm9wZXJ0aWVzIG9mIHRoZSBkYXRhLiBJbmNyZWFzZWQgc2FtcGxlIHNpemVzIGNhbiByZWR1Y2UgZWl0aGVyIHRoZSBwcmVzZW5jZSBvciB0aGUgaW1wYWN0IG9mIG11bHRpY29sbGluZWFyaXR5LCBvciBib3RoLgoKMi4gTWFrZSBzb21lIHRyYW5zZm9ybWF0aW9uIG9mIHRoZSBkYXRhIChmb3IgZXhhbXBsZSwgcmVwbGFjZSBsZXZlbCBkYXRhIGJ5IHRoZWlyIGRpZmZlcmVuY2VzKQoKMy4gUmVtb3ZlIHNvbWUgb2YgdGhlIGhpZ2hseSBjb3JyZWxhdGVkIGZlYXR1cmVzLgoKLSBNYW51YWwgTWV0aG9kIC0gVmFyaWFuY2UgSW5mbGF0aW9uIEZhY3RvciAoVklGKQotIEF1dG9tYXRpYyBNZXRob2QgLSBSZWN1cnNpdmUgRmVhdHVyZSBFbGltaW5hdGlvbiAoUkZFKSAoSSdsbCBza2lwIGl0IGluIHRoaXMgc3R1ZHkpCi0gRmVhdHVyZSBFbG1pbmF0aW9uIHVzaW5nIFBDQSBEZWNvbXBvc2l0aW9uIChJJ2xsIHNraXAgaXQgaW4gdGhpcyBzdHVkeSkKLSBSZXBsYWNlIGhpZ2hseSBjb3JyZWxhdGVkIHJlZ3Jlc3NvcnMgd2l0aCBhIGxpbmVhciBjb21iaW5hdGlvbiBvZiB0aGVtLiAoSSdsbCBza2lwIGl0IGluIHRoaXMgc3R1ZHkpCgozLiBVc2UgcmVndWxhcml6YXRpb24gbWV0aG9kcyBzdWNoIGFzIFJJREdFIGFuZCBMQVNTTyBvciBCYXllc2lhbiByZWdyZXNzaW9uLiAoSSdsbCBza2lwIGl0IGluIHRoaXMgc3R1ZHkpCgojIyBCb3N0b24gSG91c2UgUHJpY2VzCgpFYWNoIHJlY29yZCBpbiB0aGUgZGF0YWJhc2UgZGVzY3JpYmVzIGEgQm9zdG9uIHN1YnVyYiBvciB0b3duLiBUaGUgZGF0YSB3YXMgZHJhd24gZnJvbSB0aGUgQm9zdG9uIFN0YW5kYXJkIE1ldHJvcG9saXRhbiBTdGF0aXN0aWNhbCBBcmVhIChTTVNBKSBpbiAxOTcwLiBUaGUgYXR0cmlidXRlcyBhcmUgZGXvrIFuZWQgYXMgZm9sbG93cyAodGFrZW4gZnJvbSB0aGUgVUNJIE1hY2hpbmUgTGVhcm5pbmcgUmVwb3NpdG9yeTEpOiBDUklNOiBwZXIgY2FwaXRhIGNyaW1lIHJhdGUgYnkgdG93bgoKLSBaTjogcHJvcG9ydGlvbiBvZiByZXNpZGVudGlhbCBsYW5kIHpvbmVkIGZvciBsb3RzIG92ZXIgMjUsMDAwIHNxLmZ0LgotIElORFVTOiBwcm9wb3J0aW9uIG9mIG5vbi1yZXRhaWwgYnVzaW5lc3MgYWNyZXMgcGVyIHRvd24KLSBDSEFTOiBDaGFybGVzIFJpdmVyIGR1bW15IHZhcmlhYmxlICg9IDEgaWYgdHJhY3QgYm91bmRzIHJpdmVyOyAwIG90aGVyd2lzZSkKLSBOT1g6IG5pdHJpYyBveGlkZXMgY29uY2VudHJhdGlvbiAocGFydHMgcGVyIDEwIG1pbGxpb24pCi0gUk06IGF2ZXJhZ2UgbnVtYmVyIG9mIHJvb21zIHBlciBkd2VsbGluZwotIEFHRTogcHJvcG9ydGlvbiBvZiBvd25lci1vY2N1cGllZCB1bml0cyBidWlsdCBwcmlvciB0byAxOTQwCi0gRElTOiB3ZWlnaHRlZCBkaXN0YW5jZXMgdG8g76yBdmUgQm9zdG9uIGVtcGxveW1lbnQgY2VudGVycwotIFJBRDogaW5kZXggb2YgYWNjZXNzaWJpbGl0eSB0byByYWRpYWwgaGlnaHdheXMKLSBUQVg6IGZ1bGwtdmFsdWUgcHJvcGVydHktdGF4IHJhdGUgcGVyICQxMCwwMDAKLSBQVFJBVElPOiBwdXBpbC10ZWFjaGVyIHJhdGlvIGJ5IHRvd24gMTIuIEI6IDEwMDAoQmviiJIwLjYzKTIgd2hlcmUgQmsgaXMgdGhlIHByb3BvcnRpb24gb2YgYmxhY2tzIGJ5IHRvd24gMTMuIExTVEFUOiAlIGxvd2VyIHN0YXR1cyBvZiB0aGUgcG9wdWxhdGlvbgotIE1FRFY6IE1lZGlhbiB2YWx1ZSBvZiBvd25lci1vY2N1cGllZCBob21lcyBpbiAkMTAwMHMKCldlIGNhbiBzZWUgdGhhdCB0aGUgaW5wdXQgYXR0cmlidXRlcyBoYXZlIGEgbWl4dHVyZSBvZiB1bml0cy4KCiMjIENvbXB1dGF0aW9ucwoKYGBge3IgbGlicmFyaWVzfQpybShsaXN0PWxzKCkpCmxpYnJhcnkobG10ZXN0KQpsaWJyYXJ5KGNhcikKbGlicmFyeShnbG1uZXQpCmBgYAoKCgpgYGB7cn0KY29sdW1uX25hbWVzIDwtIGMoIkNSSU0iLCAiWk4iLCAiSU5EVVMiLCAiQ0hBUyIsICJOT1giLCAiUk0iLCAiQUdFIiwgIkRJUyIsICJSQUQiLCAiVEFYIiwgIlBUUkFUSU8iLCAiQiIsICJMU1RBVCIsICJNRURWIikKZGYgPC0gcmVhZC50YWJsZSgiYWRvdC9ob3VzaW5nLmRhdGEiLCBoZWFkZXIgPSBGQUxTRSwgc2VwID0gIiIsIGNvbC5uYW1lcyA9IGNvbHVtbl9uYW1lcykKCmBgYAoKCgpgYGB7cn0KIyMjIFBsb3R0aW5nIHRoZSBjb3JyZWxhdGlvbiBiZXR3ZWVuIHZhcmlvdXMgY29sdW1ucwpwYW5lbC5jb3IgPC0gZnVuY3Rpb24oeCwgeSwgZGlnaXRzID0gMiwgcHJlZml4ID0gIiIsIGNleC5jb3IsIC4uLikKewogICAgcGFyKHVzciA9IGMoMCwgMSwgMCwgMSkpCiAgICByIDwtIGFicyhjb3IoeCwgeSkpCiAgICB0eHQgPC0gZm9ybWF0KGMociwgMC4xMjM0NTY3ODkpLCBkaWdpdHMgPSBkaWdpdHMpWzFdCiAgICB0eHQgPC0gcGFzdGUwKHByZWZpeCwgdHh0KQogICAgaWYobWlzc2luZyhjZXguY29yKSkgY2V4LmNvciA8LSAwLjgvc3Ryd2lkdGgodHh0KQogICAgdGV4dCgwLjUsIDAuNSwgdHh0LCBjZXggPSBjZXguY29yICogcikKfQpwYWlycyhkZiwgbG93ZXIucGFuZWwgPSBwYW5lbC5zbW9vdGgsIHVwcGVyLnBhbmVsID0gcGFuZWwuY29yLAogICAgICBnYXA9MCwgcm93MWF0dG9wPUZBTFNFKQpgYGAKCgojIyBPdmVycGFyYW1ldHJpemVkIG1vZGVsCgpgYGB7cn0KbW9kZWxfYWxsIDwtIGxtKE1FRFYgfiAuLCBkYXRhPWRmKSAgIyB3aXRoIGFsbCB0aGUgaW5kZXBlbmRlbnQgdmFyaWFibGVzIGluIHRoZSBkYXRhZnJhbWUKCnN1bW1hcnkobW9kZWxfYWxsKQoKcHJpbnQocGFzdGUoIkFJQyA9ICIsIEFJQyhtb2RlbF9hbGwpKSkKcmVzZXR0ZXN0KG1vZGVsX2FsbCkKYGBgCgoKYGBge3J9Cm1vZGVsIDw8LSBsbShNRURWIH4gKzEgK0NSSU0gICsgIFpOICsgSU5EVVMgKyBDSEFTICsgTk9YICsgUk0gKyBBR0UgKyBESVMgICsgUkFEICsgVEFYICsgUFRSQVRJTyArIEIgICsgICBMU1RBVCAgICApCnN1bW1hcnkobW9kZWwpCnByaW50KHBhc3RlKCJBSUMgPSAiLCBBSUMobW9kZWxfYWxsKSkpCnByaW50KHBhc3RlKCJSYW1zZXkgUmVzZXQgVGVzdCA9ICIscmVzZXR0ZXN0KG1vZGVsKSkpCmBgYAoKYGBge3J9Cm1vZGVsIDw8LSBsbShNRURWIH4gKzEgK0NSSU0gICsgIFpOICsgSU5EVVMgKyBDSEFTICsgTk9YICsgUk0gICsgRElTICArIFJBRCArIFRBWCArIFBUUkFUSU8gKyBCICArICAgTFNUQVQgICAgKQpzdW1tYXJ5KG1vZGVsKQpwcmludChwYXN0ZSgiQUlDID0gIiwgQUlDKG1vZGVsKSkpCnByaW50KHBhc3RlKCJSYW1zZXkgUmVzZXQgVGVzdCA9ICIscmVzZXR0ZXN0KG1vZGVsKSkpCmBgYAoKYGBge3J9Cm1vZGVsIDw8LSBsbShNRURWIH4gKzEgK0NSSU0gICsgIFpOICsgQ0hBUyArIE5PWCArIFJNICsgRElTICArIFJBRCArIFRBWCArIFBUUkFUSU8gKyBCICArICAgTFNUQVQgICAgKQpzdW1tYXJ5KG1vZGVsKQpwcmludChwYXN0ZSgiQUlDID0gIiwgQUlDKG1vZGVsKSkpCnByaW50KHBhc3RlKCJSYW1zZXkgUmVzZXQgVGVzdCA9ICIscmVzZXR0ZXN0KG1vZGVsKSkpCmBgYAoKCmBgYHtyfQptb2RlbCA8PC0gbG0oSShsb2coTUVEVikpIH4gKzEgK0NSSU0gICsgIFpOICsgQ0hBUyArIE5PWCArIFJNICsgRElTICArIFJBRCArIFRBWCArIFBUUkFUSU8gKyBCICArICAgTFNUQVQgICAgKQpzdW1tYXJ5KG1vZGVsKQpwcmludChwYXN0ZSgiQUlDID0gIiwgQUlDKG1vZGVsKSkpCnByaW50KHBhc3RlKCJSYW1zZXkgUmVzZXQgVGVzdCA9ICIscmVzZXR0ZXN0KG1vZGVsKSkpCmBgYAoKYGBge3J9Cm1vZGVsIDw8LSBsbShsb2coTUVEVikgfiArMSArQ1JJTSAgKyAgWk4gKyBDSEFTICsgSShOT1heMikgKyBOT1ggKyBSTSArIERJUyAgKyBSQUQgKyBUQVggKyBQVFJBVElPICsgQiAgKyAgIExTVEFUICAgICkKc3VtbWFyeShtb2RlbCkKcHJpbnQocGFzdGUoIkFJQyA9ICIsIEFJQyhtb2RlbCkpKQpwcmludChwYXN0ZSgiUmFtc2V5IFJlc2V0IFRlc3QgPSAiLHJlc2V0dGVzdChtb2RlbCkpKQpgYGAKCgoKIyMgTXVsdGljb2xsaW5lYXJpdHkgdGVzdGluZwoKIyMjIFZJRgoKJCRWSUZfaSA9IFxmcmFjezF9ezEtUl4yX2l9JCQKbmlla2VkeSBzYSBob3ZvcsOtLCDFvmUgdGVudG8gdWthem92YXRlxL4gcHJlIGRhbsO6IHByZW1lbm7DuiBuZXNtaWUgYnnFpSB2w6TEjcWhw60sIGFrbyA1LCBuaWVrZWR5IHNhIGhvdm9yw60gbyBocmFuaWNpIDEwLgoKYGBge3J9CnZpZihtb2RlbF9hbGwpCmBgYAoKVml6dWFsaXrDoWNpYSBWSUYKCmBgYHtyfQp2aWZfdmFsdWVzIDwtIHZpZihtb2RlbF9hbGwpICAgICAgICAgICAjY3JlYXRlIHZlY3RvciBvZiBWSUYgdmFsdWVzCgpiYXJwbG90KHZpZl92YWx1ZXMsIG1haW4gPSAiVklGIFZhbHVlcyIsIGhvcml6ID0gVFJVRSwgY29sID0gInN0ZWVsYmx1ZSIpICNjcmVhdGUgaG9yaXpvbnRhbCBiYXIgY2hhcnQgdG8gZGlzcGxheSBlYWNoIFZJRiB2YWx1ZQoKYWJsaW5lKHYgPSA1LCBsd2QgPSAzLCBsdHkgPSAyKSAgICAjYWRkIHZlcnRpY2FsIGxpbmUgYXQgNSBhcyBhZnRlciA1IHRoZXJlIGlzIHNldmVyZSBjb3JyZWxhdGlvbgoKCmBgYAoKYGBge3J9CmRmXzlfMTAgPC0gZGZbLC1jKDksMTApXQptb2RlbF85XzEwIDwtIGxtKE1FRFYgfi4sIGRhdGE9ZGZfOV8xMCkKc3VtbWFyeShtb2RlbF85XzEwKQpgYGAKCmBgYHtyfQp2aWYobW9kZWxfOV8xMCkKYGBgCgoKIyMgSXMgaGV0ZXJvc2tlZGFzdGljaXR5IHByZXNlbnQgaW4gdGhlIG5ldyBtb2RlbD8KCgpgYGB7cn0KZGZfOV8xMCA8LSBkZlssLWMoOSwxMCldCm1vZGVsXzlfMTAgPC0gbG0oTUVEViB+LiwgZGF0YT1kZl85XzEwKQpicHRlc3QobW9kZWxfOV8xMCkKYGBgCgpZZXMsIGl0IGlzLi4uCgo=