Comparison of Standard LRM with BMA-LRM of dataset = Boston-price

Standard LRM

1. Import data

setwd('D:/Rml')
dat = read.csv('Boston-price.csv')
attach(dat)
head(dat)

##   X    crim zn indus chas   nox    rm  age    dis rad tax ptratio  black lstat
## 1 1 0.00632 18  2.31    0 0.538 6.575 65.2 4.0900   1 296    15.3 396.90  4.98
## 2 2 0.02731  0  7.07    0 0.469 6.421 78.9 4.9671   2 242    17.8 396.90  9.14
## 3 3 0.02729  0  7.07    0 0.469 7.185 61.1 4.9671   2 242    17.8 392.83  4.03
## 4 4 0.03237  0  2.18    0 0.458 6.998 45.8 6.0622   3 222    18.7 394.63  2.94
## 5 5 0.06905  0  2.18    0 0.458 7.147 54.2 6.0622   3 222    18.7 396.90  5.33
## 6 6 0.02985  0  2.18    0 0.458 6.430 58.7 6.0622   3 222    18.7 394.12  5.21
##   medv
## 1 24.0
## 2 21.6
## 3 34.7
## 4 33.4
## 5 36.2
## 6 28.7

2. Create Standard Linear Regression Model of dataset = dat

M = lm(medv~crim+zn+indus+chas+nox+rm+age+dis+rad+tax+ptratio+black+lstat)
summary(M)

## 
## Call:
## lm(formula = medv ~ crim + zn + indus + chas + nox + rm + age + 
##     dis + rad + tax + ptratio + black + lstat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -15.595  -2.730  -0.518   1.777  26.199 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.646e+01  5.103e+00   7.144 3.28e-12 ***
## crim        -1.080e-01  3.286e-02  -3.287 0.001087 ** 
## zn           4.642e-02  1.373e-02   3.382 0.000778 ***
## indus        2.056e-02  6.150e-02   0.334 0.738288    
## chas         2.687e+00  8.616e-01   3.118 0.001925 ** 
## nox         -1.777e+01  3.820e+00  -4.651 4.25e-06 ***
## rm           3.810e+00  4.179e-01   9.116  < 2e-16 ***
## age          6.922e-04  1.321e-02   0.052 0.958229    
## dis         -1.476e+00  1.995e-01  -7.398 6.01e-13 ***
## rad          3.060e-01  6.635e-02   4.613 5.07e-06 ***
## tax         -1.233e-02  3.760e-03  -3.280 0.001112 ** 
## ptratio     -9.527e-01  1.308e-01  -7.283 1.31e-12 ***
## black        9.312e-03  2.686e-03   3.467 0.000573 ***
## lstat       -5.248e-01  5.072e-02 -10.347  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.745 on 492 degrees of freedom
## Multiple R-squared:  0.7406, Adjusted R-squared:  0.7338 
## F-statistic: 108.1 on 13 and 492 DF,  p-value: < 2.2e-16

From the result above, we can see:

Only ‘crim’, ‘zn’, ‘chas’, ‘nox’, ‘rm’, ‘dis’, ‘rad’, ‘tax’, ‘ptratio’, ‘black’ & ‘lstat’ have the greatest impact on the outcome - ‘medv’ (because their Pr values are all < 0.05 respectively)

=> Model: medv = 36.46 - 0.108.crim + 0.0464.zn + 2.687.chas - 17.77.nox + 3.81.rm - 1.476.dis + 0.306.rad - 0.0123.tax - 0.953.ptratio + 0.0093.black - 0.525.lstat

R-squared ~ 74%

=> the model can explain 74% of the dataset in reality.

EXAMPLE:

With the first observation, medv = 29.9091374
With the second observation, medv = 24.8201609

BMA-LRM

library('BMA')

## Loading required package: survival

## Loading required package: leaps

## Loading required package: robustbase

## 
## Attaching package: 'robustbase'

## The following object is masked from 'package:survival':
## 
##     heart

## Loading required package: inline

## Loading required package: rrcov

## Scalable Robust Estimators with High Breakdown Point (version 1.5-2)

x = dat[,2:14]
y = medv
N = bicreg(x,y, strict=FALSE, OR=20)
summary(N)

## 
## Call:
## bicreg(x = x, y = y, strict = FALSE, OR = 20)
## 
## 
##   5  models were selected
##  Best  5  models (cumulative posterior probability =  1 ): 
## 
##            p!=0    EV         SD        model 1     model 2     model 3   
## Intercept  100.0   36.520642  5.174370   3.634e+01   3.662e+01   3.523e+01
## crim        93.4   -0.101775  0.041894  -1.084e-01  -1.141e-01       .    
## zn          95.0    0.043273  0.016564   4.584e-02   4.574e-02   4.173e-02
## indus        0.0    0.000000  0.000000       .           .           .    
## chas        90.1    2.465386  1.152744   2.719e+00       .       2.872e+00
## nox        100.0  -17.325300  3.574303  -1.738e+01  -1.647e+01  -1.651e+01
## rm         100.0    3.813179  0.410411   3.802e+00   3.845e+00   3.832e+00
## age          0.0    0.000000  0.000000       .           .           .    
## dis        100.0   -1.476279  0.198410  -1.493e+00  -1.526e+00  -1.420e+00
## rad        100.0    0.295811  0.065488   2.996e-01   3.155e-01   2.389e-01
## tax        100.0   -0.011751  0.003427  -1.178e-02  -1.267e-02  -1.143e-02
## ptratio    100.0   -0.955902  0.133379  -9.465e-01  -9.784e-01  -9.355e-01
## black       96.3    0.009058  0.003188   9.291e-03   9.730e-03   1.032e-02
## lstat      100.0   -0.525639  0.048080  -5.226e-01  -5.281e-01  -5.479e-01
##                                                                           
## nVar                                       11          10          10     
## r2                                       0.741       0.735       0.735    
## BIC                                     -6.143e+02  -6.102e+02  -6.094e+02
## post prob                                0.747       0.099       0.066    
##            model 4     model 5   
## Intercept   3.703e+01   4.145e+01
## crim       -9.819e-02  -1.217e-01
## zn              .       4.619e-02
## indus           .           .    
## chas        2.712e+00   2.872e+00
## nox        -1.863e+01  -1.826e+01
## rm          4.003e+00   3.673e+00
## age             .           .    
## dis        -1.178e+00  -1.516e+00
## rad         2.844e-01   2.839e-01
## tax        -9.548e-03  -1.229e-02
## ptratio    -1.097e+00  -9.310e-01
## black       9.358e-03       .    
## lstat      -5.219e-01  -5.465e-01
##                                  
## nVar          10          10     
## r2          0.735       0.734    
## BIC        -6.089e+02  -6.083e+02
## post prob   0.050       0.037

The best model is model 1 (because its BIC value is the least in comparison with those of the remaining models).
From the result in model 1 above, we can see:

Intercept = 36.34
crim = -0.108
zn = 0.0458
chas = 2.719
nox = -17.38
rm = 3.802
dis = -1.493
rad = 0.299
tax = -0.0118
ptratio = -0.947
black = 0.0093
lstat = -0.523

=> Model 1: medv= 36.34 - 0.108.crim + 0.0458.zn + 2.719.chas - 17.38.nox + 3.802.rm - 1.493.dis + 0.299.rad - 0.0118.tax - 0.947.ptratio + 0.0093.black - 0.523.lstat

Only ‘crim’, ‘zn’, ‘chas’, ‘nox’, ‘rm’, ‘dis’, ‘rad’, ‘tax’, ‘ptratio’, ‘black’ & ‘lstat’ have the greatest impact on the outcome - ‘medv’.
Among all the variables, ‘rm’ has the greatest impact on ‘medv’.
R-squared ~ 74%.

EXAMPLE:

With the first observation, medv = 30.1087874
With the second observation, medv = 24.9793422

CONCLUSION

In both cases (Standard LRM & BMA LRM), the R-squared values equals to 74% respectively, which means both models can explain 74% of the dataset in reality.

Comparison-LRM-Boston-price

Thong

May 20, 2020

Comparison of Standard LRM with BMA-LRM of dataset = Boston-price

Standard LRM

BMA-LRM

CONCLUSION