1. Import data
setwd('D:/Rml')
dat = read.csv('Boston-price.csv')
attach(dat)
head(dat)
## X crim zn indus chas nox rm age dis rad tax ptratio black lstat
## 1 1 0.00632 18 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98
## 2 2 0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14
## 3 3 0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03
## 4 4 0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94
## 5 5 0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33
## 6 6 0.02985 0 2.18 0 0.458 6.430 58.7 6.0622 3 222 18.7 394.12 5.21
## medv
## 1 24.0
## 2 21.6
## 3 34.7
## 4 33.4
## 5 36.2
## 6 28.7
2. Create Standard Linear Regression Model of dataset = dat
M = lm(medv~crim+zn+indus+chas+nox+rm+age+dis+rad+tax+ptratio+black+lstat)
summary(M)
##
## Call:
## lm(formula = medv ~ crim + zn + indus + chas + nox + rm + age +
## dis + rad + tax + ptratio + black + lstat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.595 -2.730 -0.518 1.777 26.199
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.646e+01 5.103e+00 7.144 3.28e-12 ***
## crim -1.080e-01 3.286e-02 -3.287 0.001087 **
## zn 4.642e-02 1.373e-02 3.382 0.000778 ***
## indus 2.056e-02 6.150e-02 0.334 0.738288
## chas 2.687e+00 8.616e-01 3.118 0.001925 **
## nox -1.777e+01 3.820e+00 -4.651 4.25e-06 ***
## rm 3.810e+00 4.179e-01 9.116 < 2e-16 ***
## age 6.922e-04 1.321e-02 0.052 0.958229
## dis -1.476e+00 1.995e-01 -7.398 6.01e-13 ***
## rad 3.060e-01 6.635e-02 4.613 5.07e-06 ***
## tax -1.233e-02 3.760e-03 -3.280 0.001112 **
## ptratio -9.527e-01 1.308e-01 -7.283 1.31e-12 ***
## black 9.312e-03 2.686e-03 3.467 0.000573 ***
## lstat -5.248e-01 5.072e-02 -10.347 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.745 on 492 degrees of freedom
## Multiple R-squared: 0.7406, Adjusted R-squared: 0.7338
## F-statistic: 108.1 on 13 and 492 DF, p-value: < 2.2e-16
From the result above, we can see:
=> Model: medv = 36.46 - 0.108.crim + 0.0464.zn + 2.687.chas - 17.77.nox + 3.81.rm - 1.476.dis + 0.306.rad - 0.0123.tax - 0.953.ptratio + 0.0093.black - 0.525.lstat
=> the model can explain 74% of the dataset in reality.
EXAMPLE:
With the first observation, medv = 29.9091374
With the second observation, medv = 24.8201609
library('BMA')
## Loading required package: survival
## Loading required package: leaps
## Loading required package: robustbase
##
## Attaching package: 'robustbase'
## The following object is masked from 'package:survival':
##
## heart
## Loading required package: inline
## Loading required package: rrcov
## Scalable Robust Estimators with High Breakdown Point (version 1.5-2)
x = dat[,2:14]
y = medv
N = bicreg(x,y, strict=FALSE, OR=20)
summary(N)
##
## Call:
## bicreg(x = x, y = y, strict = FALSE, OR = 20)
##
##
## 5 models were selected
## Best 5 models (cumulative posterior probability = 1 ):
##
## p!=0 EV SD model 1 model 2 model 3
## Intercept 100.0 36.520642 5.174370 3.634e+01 3.662e+01 3.523e+01
## crim 93.4 -0.101775 0.041894 -1.084e-01 -1.141e-01 .
## zn 95.0 0.043273 0.016564 4.584e-02 4.574e-02 4.173e-02
## indus 0.0 0.000000 0.000000 . . .
## chas 90.1 2.465386 1.152744 2.719e+00 . 2.872e+00
## nox 100.0 -17.325300 3.574303 -1.738e+01 -1.647e+01 -1.651e+01
## rm 100.0 3.813179 0.410411 3.802e+00 3.845e+00 3.832e+00
## age 0.0 0.000000 0.000000 . . .
## dis 100.0 -1.476279 0.198410 -1.493e+00 -1.526e+00 -1.420e+00
## rad 100.0 0.295811 0.065488 2.996e-01 3.155e-01 2.389e-01
## tax 100.0 -0.011751 0.003427 -1.178e-02 -1.267e-02 -1.143e-02
## ptratio 100.0 -0.955902 0.133379 -9.465e-01 -9.784e-01 -9.355e-01
## black 96.3 0.009058 0.003188 9.291e-03 9.730e-03 1.032e-02
## lstat 100.0 -0.525639 0.048080 -5.226e-01 -5.281e-01 -5.479e-01
##
## nVar 11 10 10
## r2 0.741 0.735 0.735
## BIC -6.143e+02 -6.102e+02 -6.094e+02
## post prob 0.747 0.099 0.066
## model 4 model 5
## Intercept 3.703e+01 4.145e+01
## crim -9.819e-02 -1.217e-01
## zn . 4.619e-02
## indus . .
## chas 2.712e+00 2.872e+00
## nox -1.863e+01 -1.826e+01
## rm 4.003e+00 3.673e+00
## age . .
## dis -1.178e+00 -1.516e+00
## rad 2.844e-01 2.839e-01
## tax -9.548e-03 -1.229e-02
## ptratio -1.097e+00 -9.310e-01
## black 9.358e-03 .
## lstat -5.219e-01 -5.465e-01
##
## nVar 10 10
## r2 0.735 0.734
## BIC -6.089e+02 -6.083e+02
## post prob 0.050 0.037
The best model is model 1 (because its BIC value is the least in comparison with those of the remaining models).
From the result in model 1 above, we can see:
=> Model 1: medv= 36.34 - 0.108.crim + 0.0458.zn + 2.719.chas - 17.38.nox + 3.802.rm - 1.493.dis + 0.299.rad - 0.0118.tax - 0.947.ptratio + 0.0093.black - 0.523.lstat
Only ‘crim’, ‘zn’, ‘chas’, ‘nox’, ‘rm’, ‘dis’, ‘rad’, ‘tax’, ‘ptratio’, ‘black’ & ‘lstat’ have the greatest impact on the outcome - ‘medv’.
Among all the variables, ‘rm’ has the greatest impact on ‘medv’.
R-squared ~ 74%.
EXAMPLE:
With the first observation, medv = 30.1087874
With the second observation, medv = 24.9793422
In both cases (Standard LRM & BMA LRM), the R-squared values equals to 74% respectively, which means both models can explain 74% of the dataset in reality.