Project : BostonHousing

Description

Housing data for 506 census tracts of Boston from the 1970 census. The dataframe BostonHousing contains the original data by Harrison and Rubinfeld (1979), the dataframe BostonHousing2 the corrected version with additional spatial information (see references below).

Format

The original data are 506 observations on 14 variables, medv being the target variable:

crim percapita crime rate by town

zn proportion of residential land zoned for lots over 25,000 sq.ft

indus proportion of non-retail business acres per town

chas Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)

nox nitric oxides concentration (parts per 10 million)

rm average number of rooms per dwelling

age proportion of owner-occupied units built prior to 1940

dis weighted distances to five Boston employment centres

rad index of accessibility to radial highways

tax full-value property-tax rate per USD 10,000

ptratio pupil-teacher ratio by town

b 1000(B - 0.63)^2 where B is the proportion of blacks by town

lstat percentage of lower status of the population

medv median value of owner-occupied homes in USD 1000’s

Call Packages

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.8
## ✓ tidyr   1.2.0     ✓ stringr 1.4.0
## ✓ readr   2.1.2     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(caret)
## Loading required package: lattice
## 
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
## 
##     lift
library(mlbench)

Load Data

Boston Housing Data - BostonHousing

data("BostonHousing")
glimpse(BostonHousing)
## Rows: 506
## Columns: 14
## $ crim    <dbl> 0.00632, 0.02731, 0.02729, 0.03237, 0.06905, 0.02985, 0.08829,…
## $ zn      <dbl> 18.0, 0.0, 0.0, 0.0, 0.0, 0.0, 12.5, 12.5, 12.5, 12.5, 12.5, 1…
## $ indus   <dbl> 2.31, 7.07, 7.07, 2.18, 2.18, 2.18, 7.87, 7.87, 7.87, 7.87, 7.…
## $ chas    <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ nox     <dbl> 0.538, 0.469, 0.469, 0.458, 0.458, 0.458, 0.524, 0.524, 0.524,…
## $ rm      <dbl> 6.575, 6.421, 7.185, 6.998, 7.147, 6.430, 6.012, 6.172, 5.631,…
## $ age     <dbl> 65.2, 78.9, 61.1, 45.8, 54.2, 58.7, 66.6, 96.1, 100.0, 85.9, 9…
## $ dis     <dbl> 4.0900, 4.9671, 4.9671, 6.0622, 6.0622, 6.0622, 5.5605, 5.9505…
## $ rad     <dbl> 1, 2, 2, 3, 3, 3, 5, 5, 5, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4,…
## $ tax     <dbl> 296, 242, 242, 222, 222, 222, 311, 311, 311, 311, 311, 311, 31…
## $ ptratio <dbl> 15.3, 17.8, 17.8, 18.7, 18.7, 18.7, 15.2, 15.2, 15.2, 15.2, 15…
## $ b       <dbl> 396.90, 396.90, 392.83, 394.63, 396.90, 394.12, 395.60, 396.90…
## $ lstat   <dbl> 4.98, 9.14, 4.03, 2.94, 5.33, 5.21, 12.43, 19.15, 29.93, 17.10…
## $ medv    <dbl> 24.0, 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15…

Check Data

ifelse(mean(complete.cases(BostonHousing))==1, "data complete" , "data needs to clean")
## [1] "data complete"

Prepare Data

  • split data n = 506 split train data 80%, test data 20%
set.seed(55)

id <- createDataPartition(y = BostonHousing$medv, p = 0.8, list = FALSE)
train_data <- BostonHousing[id, ] 
test_data <- BostonHousing[-id, ] 
  
nrow(train_data)
## [1] 407
nrow(test_data)
## [1] 99

Train model

  1. Linear Regression

  2. K-nearest neighbors(knn)

linear Regression

set.seed(55)

control <- trainControl(method = "repeatedcv",
                        repeats = 5,
                        number = 5,
                        verboseIter = TRUE)

lm_model <- train(medv ~. ,
                  data = train_data,
                  method = "lm",
                  trControl = control)
## + Fold1.Rep1: intercept=TRUE 
## - Fold1.Rep1: intercept=TRUE 
## + Fold2.Rep1: intercept=TRUE 
## - Fold2.Rep1: intercept=TRUE 
## + Fold3.Rep1: intercept=TRUE 
## - Fold3.Rep1: intercept=TRUE 
## + Fold4.Rep1: intercept=TRUE 
## - Fold4.Rep1: intercept=TRUE 
## + Fold5.Rep1: intercept=TRUE 
## - Fold5.Rep1: intercept=TRUE 
## + Fold1.Rep2: intercept=TRUE 
## - Fold1.Rep2: intercept=TRUE 
## + Fold2.Rep2: intercept=TRUE 
## - Fold2.Rep2: intercept=TRUE 
## + Fold3.Rep2: intercept=TRUE 
## - Fold3.Rep2: intercept=TRUE 
## + Fold4.Rep2: intercept=TRUE 
## - Fold4.Rep2: intercept=TRUE 
## + Fold5.Rep2: intercept=TRUE 
## - Fold5.Rep2: intercept=TRUE 
## + Fold1.Rep3: intercept=TRUE 
## - Fold1.Rep3: intercept=TRUE 
## + Fold2.Rep3: intercept=TRUE 
## - Fold2.Rep3: intercept=TRUE 
## + Fold3.Rep3: intercept=TRUE 
## - Fold3.Rep3: intercept=TRUE 
## + Fold4.Rep3: intercept=TRUE 
## - Fold4.Rep3: intercept=TRUE 
## + Fold5.Rep3: intercept=TRUE 
## - Fold5.Rep3: intercept=TRUE 
## + Fold1.Rep4: intercept=TRUE 
## - Fold1.Rep4: intercept=TRUE 
## + Fold2.Rep4: intercept=TRUE 
## - Fold2.Rep4: intercept=TRUE 
## + Fold3.Rep4: intercept=TRUE 
## - Fold3.Rep4: intercept=TRUE 
## + Fold4.Rep4: intercept=TRUE 
## - Fold4.Rep4: intercept=TRUE 
## + Fold5.Rep4: intercept=TRUE 
## - Fold5.Rep4: intercept=TRUE 
## + Fold1.Rep5: intercept=TRUE 
## - Fold1.Rep5: intercept=TRUE 
## + Fold2.Rep5: intercept=TRUE 
## - Fold2.Rep5: intercept=TRUE 
## + Fold3.Rep5: intercept=TRUE 
## - Fold3.Rep5: intercept=TRUE 
## + Fold4.Rep5: intercept=TRUE 
## - Fold4.Rep5: intercept=TRUE 
## + Fold5.Rep5: intercept=TRUE 
## - Fold5.Rep5: intercept=TRUE 
## Aggregating results
## Fitting final model on full training set

K-nearest neighbors(knn)

set.seed(55)

control <- trainControl(method = "repeatedcv",
                        repeats = 5,
                        number = 5,
                        verboseIter = TRUE)

knn_model <- train(medv ~. ,
                  data = train_data,
                  method = "knn",
                  trControl = control)
## + Fold1.Rep1: k=5 
## - Fold1.Rep1: k=5 
## + Fold1.Rep1: k=7 
## - Fold1.Rep1: k=7 
## + Fold1.Rep1: k=9 
## - Fold1.Rep1: k=9 
## + Fold2.Rep1: k=5 
## - Fold2.Rep1: k=5 
## + Fold2.Rep1: k=7 
## - Fold2.Rep1: k=7 
## + Fold2.Rep1: k=9 
## - Fold2.Rep1: k=9 
## + Fold3.Rep1: k=5 
## - Fold3.Rep1: k=5 
## + Fold3.Rep1: k=7 
## - Fold3.Rep1: k=7 
## + Fold3.Rep1: k=9 
## - Fold3.Rep1: k=9 
## + Fold4.Rep1: k=5 
## - Fold4.Rep1: k=5 
## + Fold4.Rep1: k=7 
## - Fold4.Rep1: k=7 
## + Fold4.Rep1: k=9 
## - Fold4.Rep1: k=9 
## + Fold5.Rep1: k=5 
## - Fold5.Rep1: k=5 
## + Fold5.Rep1: k=7 
## - Fold5.Rep1: k=7 
## + Fold5.Rep1: k=9 
## - Fold5.Rep1: k=9 
## + Fold1.Rep2: k=5 
## - Fold1.Rep2: k=5 
## + Fold1.Rep2: k=7 
## - Fold1.Rep2: k=7 
## + Fold1.Rep2: k=9 
## - Fold1.Rep2: k=9 
## + Fold2.Rep2: k=5 
## - Fold2.Rep2: k=5 
## + Fold2.Rep2: k=7 
## - Fold2.Rep2: k=7 
## + Fold2.Rep2: k=9 
## - Fold2.Rep2: k=9 
## + Fold3.Rep2: k=5 
## - Fold3.Rep2: k=5 
## + Fold3.Rep2: k=7 
## - Fold3.Rep2: k=7 
## + Fold3.Rep2: k=9 
## - Fold3.Rep2: k=9 
## + Fold4.Rep2: k=5 
## - Fold4.Rep2: k=5 
## + Fold4.Rep2: k=7 
## - Fold4.Rep2: k=7 
## + Fold4.Rep2: k=9 
## - Fold4.Rep2: k=9 
## + Fold5.Rep2: k=5 
## - Fold5.Rep2: k=5 
## + Fold5.Rep2: k=7 
## - Fold5.Rep2: k=7 
## + Fold5.Rep2: k=9 
## - Fold5.Rep2: k=9 
## + Fold1.Rep3: k=5 
## - Fold1.Rep3: k=5 
## + Fold1.Rep3: k=7 
## - Fold1.Rep3: k=7 
## + Fold1.Rep3: k=9 
## - Fold1.Rep3: k=9 
## + Fold2.Rep3: k=5 
## - Fold2.Rep3: k=5 
## + Fold2.Rep3: k=7 
## - Fold2.Rep3: k=7 
## + Fold2.Rep3: k=9 
## - Fold2.Rep3: k=9 
## + Fold3.Rep3: k=5 
## - Fold3.Rep3: k=5 
## + Fold3.Rep3: k=7 
## - Fold3.Rep3: k=7 
## + Fold3.Rep3: k=9 
## - Fold3.Rep3: k=9 
## + Fold4.Rep3: k=5 
## - Fold4.Rep3: k=5 
## + Fold4.Rep3: k=7 
## - Fold4.Rep3: k=7 
## + Fold4.Rep3: k=9 
## - Fold4.Rep3: k=9 
## + Fold5.Rep3: k=5 
## - Fold5.Rep3: k=5 
## + Fold5.Rep3: k=7 
## - Fold5.Rep3: k=7 
## + Fold5.Rep3: k=9 
## - Fold5.Rep3: k=9 
## + Fold1.Rep4: k=5 
## - Fold1.Rep4: k=5 
## + Fold1.Rep4: k=7 
## - Fold1.Rep4: k=7 
## + Fold1.Rep4: k=9 
## - Fold1.Rep4: k=9 
## + Fold2.Rep4: k=5 
## - Fold2.Rep4: k=5 
## + Fold2.Rep4: k=7 
## - Fold2.Rep4: k=7 
## + Fold2.Rep4: k=9 
## - Fold2.Rep4: k=9 
## + Fold3.Rep4: k=5 
## - Fold3.Rep4: k=5 
## + Fold3.Rep4: k=7 
## - Fold3.Rep4: k=7 
## + Fold3.Rep4: k=9 
## - Fold3.Rep4: k=9 
## + Fold4.Rep4: k=5 
## - Fold4.Rep4: k=5 
## + Fold4.Rep4: k=7 
## - Fold4.Rep4: k=7 
## + Fold4.Rep4: k=9 
## - Fold4.Rep4: k=9 
## + Fold5.Rep4: k=5 
## - Fold5.Rep4: k=5 
## + Fold5.Rep4: k=7 
## - Fold5.Rep4: k=7 
## + Fold5.Rep4: k=9 
## - Fold5.Rep4: k=9 
## + Fold1.Rep5: k=5 
## - Fold1.Rep5: k=5 
## + Fold1.Rep5: k=7 
## - Fold1.Rep5: k=7 
## + Fold1.Rep5: k=9 
## - Fold1.Rep5: k=9 
## + Fold2.Rep5: k=5 
## - Fold2.Rep5: k=5 
## + Fold2.Rep5: k=7 
## - Fold2.Rep5: k=7 
## + Fold2.Rep5: k=9 
## - Fold2.Rep5: k=9 
## + Fold3.Rep5: k=5 
## - Fold3.Rep5: k=5 
## + Fold3.Rep5: k=7 
## - Fold3.Rep5: k=7 
## + Fold3.Rep5: k=9 
## - Fold3.Rep5: k=9 
## + Fold4.Rep5: k=5 
## - Fold4.Rep5: k=5 
## + Fold4.Rep5: k=7 
## - Fold4.Rep5: k=7 
## + Fold4.Rep5: k=9 
## - Fold4.Rep5: k=9 
## + Fold5.Rep5: k=5 
## - Fold5.Rep5: k=5 
## + Fold5.Rep5: k=7 
## - Fold5.Rep5: k=7 
## + Fold5.Rep5: k=9 
## - Fold5.Rep5: k=9 
## Aggregating results
## Selecting tuning parameters
## Fitting k = 5 on full training set

Random Forest

set.seed(55)

control <- trainControl(method = "repeatedcv",
                        repeats = 5,
                        number = 5,
                        verboseIter = TRUE)

rf_model <- train(medv ~. ,
                  data = train_data,
                  method = "rf",
                  trControl = control)
## + Fold1.Rep1: mtry= 2 
## - Fold1.Rep1: mtry= 2 
## + Fold1.Rep1: mtry= 7 
## - Fold1.Rep1: mtry= 7 
## + Fold1.Rep1: mtry=13 
## - Fold1.Rep1: mtry=13 
## + Fold2.Rep1: mtry= 2 
## - Fold2.Rep1: mtry= 2 
## + Fold2.Rep1: mtry= 7 
## - Fold2.Rep1: mtry= 7 
## + Fold2.Rep1: mtry=13 
## - Fold2.Rep1: mtry=13 
## + Fold3.Rep1: mtry= 2 
## - Fold3.Rep1: mtry= 2 
## + Fold3.Rep1: mtry= 7 
## - Fold3.Rep1: mtry= 7 
## + Fold3.Rep1: mtry=13 
## - Fold3.Rep1: mtry=13 
## + Fold4.Rep1: mtry= 2 
## - Fold4.Rep1: mtry= 2 
## + Fold4.Rep1: mtry= 7 
## - Fold4.Rep1: mtry= 7 
## + Fold4.Rep1: mtry=13 
## - Fold4.Rep1: mtry=13 
## + Fold5.Rep1: mtry= 2 
## - Fold5.Rep1: mtry= 2 
## + Fold5.Rep1: mtry= 7 
## - Fold5.Rep1: mtry= 7 
## + Fold5.Rep1: mtry=13 
## - Fold5.Rep1: mtry=13 
## + Fold1.Rep2: mtry= 2 
## - Fold1.Rep2: mtry= 2 
## + Fold1.Rep2: mtry= 7 
## - Fold1.Rep2: mtry= 7 
## + Fold1.Rep2: mtry=13 
## - Fold1.Rep2: mtry=13 
## + Fold2.Rep2: mtry= 2 
## - Fold2.Rep2: mtry= 2 
## + Fold2.Rep2: mtry= 7 
## - Fold2.Rep2: mtry= 7 
## + Fold2.Rep2: mtry=13 
## - Fold2.Rep2: mtry=13 
## + Fold3.Rep2: mtry= 2 
## - Fold3.Rep2: mtry= 2 
## + Fold3.Rep2: mtry= 7 
## - Fold3.Rep2: mtry= 7 
## + Fold3.Rep2: mtry=13 
## - Fold3.Rep2: mtry=13 
## + Fold4.Rep2: mtry= 2 
## - Fold4.Rep2: mtry= 2 
## + Fold4.Rep2: mtry= 7 
## - Fold4.Rep2: mtry= 7 
## + Fold4.Rep2: mtry=13 
## - Fold4.Rep2: mtry=13 
## + Fold5.Rep2: mtry= 2 
## - Fold5.Rep2: mtry= 2 
## + Fold5.Rep2: mtry= 7 
## - Fold5.Rep2: mtry= 7 
## + Fold5.Rep2: mtry=13 
## - Fold5.Rep2: mtry=13 
## + Fold1.Rep3: mtry= 2 
## - Fold1.Rep3: mtry= 2 
## + Fold1.Rep3: mtry= 7 
## - Fold1.Rep3: mtry= 7 
## + Fold1.Rep3: mtry=13 
## - Fold1.Rep3: mtry=13 
## + Fold2.Rep3: mtry= 2 
## - Fold2.Rep3: mtry= 2 
## + Fold2.Rep3: mtry= 7 
## - Fold2.Rep3: mtry= 7 
## + Fold2.Rep3: mtry=13 
## - Fold2.Rep3: mtry=13 
## + Fold3.Rep3: mtry= 2 
## - Fold3.Rep3: mtry= 2 
## + Fold3.Rep3: mtry= 7 
## - Fold3.Rep3: mtry= 7 
## + Fold3.Rep3: mtry=13 
## - Fold3.Rep3: mtry=13 
## + Fold4.Rep3: mtry= 2 
## - Fold4.Rep3: mtry= 2 
## + Fold4.Rep3: mtry= 7 
## - Fold4.Rep3: mtry= 7 
## + Fold4.Rep3: mtry=13 
## - Fold4.Rep3: mtry=13 
## + Fold5.Rep3: mtry= 2 
## - Fold5.Rep3: mtry= 2 
## + Fold5.Rep3: mtry= 7 
## - Fold5.Rep3: mtry= 7 
## + Fold5.Rep3: mtry=13 
## - Fold5.Rep3: mtry=13 
## + Fold1.Rep4: mtry= 2 
## - Fold1.Rep4: mtry= 2 
## + Fold1.Rep4: mtry= 7 
## - Fold1.Rep4: mtry= 7 
## + Fold1.Rep4: mtry=13 
## - Fold1.Rep4: mtry=13 
## + Fold2.Rep4: mtry= 2 
## - Fold2.Rep4: mtry= 2 
## + Fold2.Rep4: mtry= 7 
## - Fold2.Rep4: mtry= 7 
## + Fold2.Rep4: mtry=13 
## - Fold2.Rep4: mtry=13 
## + Fold3.Rep4: mtry= 2 
## - Fold3.Rep4: mtry= 2 
## + Fold3.Rep4: mtry= 7 
## - Fold3.Rep4: mtry= 7 
## + Fold3.Rep4: mtry=13 
## - Fold3.Rep4: mtry=13 
## + Fold4.Rep4: mtry= 2 
## - Fold4.Rep4: mtry= 2 
## + Fold4.Rep4: mtry= 7 
## - Fold4.Rep4: mtry= 7 
## + Fold4.Rep4: mtry=13 
## - Fold4.Rep4: mtry=13 
## + Fold5.Rep4: mtry= 2 
## - Fold5.Rep4: mtry= 2 
## + Fold5.Rep4: mtry= 7 
## - Fold5.Rep4: mtry= 7 
## + Fold5.Rep4: mtry=13 
## - Fold5.Rep4: mtry=13 
## + Fold1.Rep5: mtry= 2 
## - Fold1.Rep5: mtry= 2 
## + Fold1.Rep5: mtry= 7 
## - Fold1.Rep5: mtry= 7 
## + Fold1.Rep5: mtry=13 
## - Fold1.Rep5: mtry=13 
## + Fold2.Rep5: mtry= 2 
## - Fold2.Rep5: mtry= 2 
## + Fold2.Rep5: mtry= 7 
## - Fold2.Rep5: mtry= 7 
## + Fold2.Rep5: mtry=13 
## - Fold2.Rep5: mtry=13 
## + Fold3.Rep5: mtry= 2 
## - Fold3.Rep5: mtry= 2 
## + Fold3.Rep5: mtry= 7 
## - Fold3.Rep5: mtry= 7 
## + Fold3.Rep5: mtry=13 
## - Fold3.Rep5: mtry=13 
## + Fold4.Rep5: mtry= 2 
## - Fold4.Rep5: mtry= 2 
## + Fold4.Rep5: mtry= 7 
## - Fold4.Rep5: mtry= 7 
## + Fold4.Rep5: mtry=13 
## - Fold4.Rep5: mtry=13 
## + Fold5.Rep5: mtry= 2 
## - Fold5.Rep5: mtry= 2 
## + Fold5.Rep5: mtry= 7 
## - Fold5.Rep5: mtry= 7 
## + Fold5.Rep5: mtry=13 
## - Fold5.Rep5: mtry=13 
## Aggregating results
## Selecting tuning parameters
## Fitting mtry = 7 on full training set
rmse_models <- data.frame(model = c("lm", "knn", "rf"),
                          RMSE = rep(0, times = 3))

lm <- min(lm_model$results$RMSE)
knn <- min(knn_model$results$RMSE)
rf <- min(rf_model$results$RMSE)
rmse_models$RMSE <- c(lm, knn, rf)
rmse_models
##   model     RMSE
## 1    lm 4.728522
## 2   knn 6.471621
## 3    rf 3.139809
rf_model
## Random Forest 
## 
## 407 samples
##  13 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold, repeated 5 times) 
## Summary of sample sizes: 326, 326, 325, 326, 325, 325, ... 
## Resampling results across tuning parameters:
## 
##   mtry  RMSE      Rsquared   MAE     
##    2    3.474057  0.8759803  2.351505
##    7    3.139809  0.8848565  2.142029
##   13    3.288765  0.8704464  2.220037
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 7.

Scoring test data(unseen data)

  • rf_model is used to predict unseen data
p_medv <- predict(rf_model, newdata = test_data)
p_medv
##        8       10       12       16       17       19       23       25 
## 17.86384 18.43599 20.59960 20.07259 21.05760 18.57372 16.69547 16.88583 
##       28       31       33       40       41       43       44       46 
## 15.31268 14.78848 15.21248 28.40669 35.04628 24.59973 24.36465 19.94655 
##       49       54       76       79       84       85       91      102 
## 18.06048 20.97901 22.79911 21.05017 23.64261 22.67881 22.84964 25.39752 
##      104      105      106      113      125      126      130      135 
## 20.07225 20.24468 18.21924 19.36375 18.16320 19.70678 16.10115 15.39009 
##      138      141      145      151      154      155      158      171 
## 18.77348 15.54499 14.92889 19.43213 16.06445 16.66335 32.18671 20.50567 
##      174      181      184      185      187      189      201      217 
## 23.16568 37.88763 29.90304 23.62329 39.75825 28.31180 34.56370 21.68226 
##      226      238      242      252      254      255      258      262 
## 40.52779 33.06231 21.17975 28.50388 39.77825 22.85665 44.22772 40.93354 
##      272      280      283      290      303      307      322      325 
## 25.07377 31.33112 44.88021 23.53066 23.46055 34.58801 23.55078 23.65016 
##      334      341      346      348      353      354      359      367 
## 22.74199 20.02289 20.12858 24.35466 21.14097 30.02515 20.99503 18.38193 
##      368      373      376      382      392      394      396      398 
## 21.61542 31.40763 25.35163 11.79328 15.60912 14.95341 13.94374 12.60148 
##      402      406      407      411      413      428      430      440 
## 11.21567  9.57179 16.52086 26.26092 13.44742 15.41916 11.76756 12.35799 
##      442      462      467      472      473      476      486      495 
## 13.79481 19.68410 14.98616 20.48169 20.15240 15.09049 21.91237 20.80043 
##      496      500      501 
## 19.64760 19.12090 19.93675

Evaluation model (rf_model)

  • RMSE : root mean square error is used to evaluate model.
test_rmse <- sqrt(mean((test_data$medv - p_medv)**2))
test_rmse
## [1] 3.846461

Sum Up

When comparing with three models:

  1. Linear Regression

  2. K-Nearest Neighbors(knn)

  3. Random Forest

The Random Forest model is the high efficient model for predicting Boston Housing Data with the lowest RMSE. The RMSE of training and testing data is 3.139809, 3.846461 respectively.