The project is a study of machine learning algorithms, including classical algorithms such as linear regression, decision trees, regularized regression, and random forests. The project aims to predict house prices using these algorithms and features such as the number of bathrooms, bedrooms, living area, and number of views, and etc. The project also includes exporting the model for further use.
This dataset contains information about the prices of houses in India. It includes several features that affect the price of a house, including living area, number of view, grade of house, number of bedrooms, number of bathrooms, number of floors, build year, lattitude, longitude and others. You can access the source of the data from this link: House Price in India
| No. | Attribute | Type of Data |
|---|---|---|
| 1 | id | Integer |
| 2 | Date | Integer |
| 3 | No. of bedrooms | Integer |
| 4 | No. of bathrooms | Decimal |
| 5 | Living area | Integer |
| 6 | Lot area | Integer |
| 7 | No. of Floors | Decimal |
| 8 | Waterfront present | Integer |
| 9 | No. of views | Integer |
| 10 | Condition of the house | Integer |
| 11 | Grade of the house | Integer |
| 12 | Area of the house(excluding basement) | Integer |
| 13 | Area of the basement | Integer |
| 14 | Built Year | year |
| 15 | Renovation Year | Integer |
| 16 | Postal Code | Integer |
| 17 | Lattitude | Decimal |
| 18 | Longitude | Decimal |
| 19 | Living_area_renov | Integer |
| 20 | Lot_area_renov | Integer |
| 21 | Number of schools nearby | Integer |
| 22 | Distance from the airport | Integer |
| 23 | Price | Integer |
library(tidyverse)
## Warning: package 'ggplot2' was built under R version 4.3.1
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)
library(caret)
## Loading required package: lattice
##
## Attaching package: 'caret'
##
## The following object is masked from 'package:purrr':
##
## lift
library(rpart)
## Warning: package 'rpart' was built under R version 4.3.1
library(rpart.plot)
Data Preparation / Data Cleansing
(0-1) Read House Price India which is excel file to dataframe named “dataset” and view the dataset.
# Read excel to dataset
dataset = tibble(read_excel('/Users/j.nrup/Documents/Data Project/House Price India.xlsx'))
head(dataset)
## # A tibble: 6 × 23
## id Date `number of bedrooms` `number of bathrooms` `living area`
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 6762810145 42491 5 2.5 3650
## 2 6762810635 42491 4 2.5 2920
## 3 6762810998 42491 5 2.75 2910
## 4 6762812605 42491 4 2.5 3310
## 5 6762812919 42491 3 2 2710
## 6 6762813105 42491 3 2.5 2600
## # ℹ 18 more variables: `lot area` <dbl>, `number of floors` <dbl>,
## # `waterfront present` <dbl>, `number of views` <dbl>,
## # `condition of the house` <dbl>, `grade of the house` <dbl>,
## # `Area of the house(excluding basement)` <dbl>,
## # `Area of the basement` <dbl>, `Built Year` <dbl>, `Renovation Year` <dbl>,
## # `Postal Code` <dbl>, Lattitude <dbl>, Longitude <dbl>,
## # living_area_renov <dbl>, lot_area_renov <dbl>, …
(0-2) Check for null values and remove them if present.
if (mean(complete.cases(dataset)) != 1) {
# Delete null.
clean_data <- drop_na(dataset)
print("Remove null completely!")
mean(complete.cases(clean_df))
} else {
clean_data <- dataset
print("Data was clean!")
}
## [1] "Data was clean!"
(0-3) Remove columns that are not used as features or independent variables, such as non-numeric columns.
clean_data <- clean_data[, !(names(clean_data) %in% c("id","Date","Built Year","Renovation Year","Postal Code","Lattitude","Longitude"))]
clean_data
## # A tibble: 14,620 × 16
## `number of bedrooms` `number of bathrooms` `living area` `lot area`
## <dbl> <dbl> <dbl> <dbl>
## 1 5 2.5 3650 9050
## 2 4 2.5 2920 4000
## 3 5 2.75 2910 9480
## 4 4 2.5 3310 42998
## 5 3 2 2710 4500
## 6 3 2.5 2600 4750
## 7 5 3.25 3660 11995
## 8 3 1.75 2240 10578
## 9 3 2.5 2390 6550
## 10 4 2.25 2200 11250
## # ℹ 14,610 more rows
## # ℹ 12 more variables: `number of floors` <dbl>, `waterfront present` <dbl>,
## # `number of views` <dbl>, `condition of the house` <dbl>,
## # `grade of the house` <dbl>, `Area of the house(excluding basement)` <dbl>,
## # `Area of the basement` <dbl>, living_area_renov <dbl>,
## # lot_area_renov <dbl>, `Number of schools nearby` <dbl>,
## # `Distance from the airport` <dbl>, Price <dbl>
(0-4) Change all columns name to snake case format.
col_name_mappings <- c(
"number of bedrooms" = "no_of_bedrooms",
"number of bathrooms" = "no_of_bathrooms",
"living area" = "living_area",
"lot area" = "lot_area",
"number of floors" = "no_of_floors",
"waterfront present" = "waterfront",
"number of views" = "no_of_views",
"condition of the house" = "condition_house",
"grade of the house" = "grade_house",
"Area of the house(excluding basement)" = "area_house",
"Area of the basement" = "area_basement",
"living_area_renov" = "living_renov",
"lot_area_renov" = "lot_renov",
"Number of schools nearby" = "no_of_schools_nearby",
"Distance from the airport" = "distance_airport",
"Price" = "price"
)
# Rename columns using the mappings
colnames(clean_data) <- sapply(colnames(clean_data), function(col) col_name_mappings[col])
clean_data
## # A tibble: 14,620 × 16
## no_of_bedrooms no_of_bathrooms living_area lot_area no_of_floors waterfront
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 5 2.5 3650 9050 2 0
## 2 4 2.5 2920 4000 1.5 0
## 3 5 2.75 2910 9480 1.5 0
## 4 4 2.5 3310 42998 2 0
## 5 3 2 2710 4500 1.5 0
## 6 3 2.5 2600 4750 1 0
## 7 5 3.25 3660 11995 2 0
## 8 3 1.75 2240 10578 2 0
## 9 3 2.5 2390 6550 1 0
## 10 4 2.25 2200 11250 1.5 0
## # ℹ 14,610 more rows
## # ℹ 10 more variables: no_of_views <dbl>, condition_house <dbl>,
## # grade_house <dbl>, area_house <dbl>, area_basement <dbl>,
## # living_renov <dbl>, lot_renov <dbl>, no_of_schools_nearby <dbl>,
## # distance_airport <dbl>, price <dbl>
(0-5) Check distribution of house price by Histogram chart.
ggplot(data = clean_data, mapping = aes(x = price)) +
geom_histogram(bins=30, fill = "#F5AD9E") +
labs(title = "Distribution of House Price") +
theme_minimal()
Analyzing the histogram revealed a right-skewed distribution in the price variable. This suggests a non-linear relationship between variables, which can lead to negatively skewed errors in linear regression models. To address this, we can apply a log transformation to the features, normalizing their distribution and improving the model’s accuracy.
(0-6) Apply a log transformation.
data_lm <- clean_data %>%
mutate(log_price = log(price))
data_lm
## # A tibble: 14,620 × 17
## no_of_bedrooms no_of_bathrooms living_area lot_area no_of_floors waterfront
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 5 2.5 3650 9050 2 0
## 2 4 2.5 2920 4000 1.5 0
## 3 5 2.75 2910 9480 1.5 0
## 4 4 2.5 3310 42998 2 0
## 5 3 2 2710 4500 1.5 0
## 6 3 2.5 2600 4750 1 0
## 7 5 3.25 3660 11995 2 0
## 8 3 1.75 2240 10578 2 0
## 9 3 2.5 2390 6550 1 0
## 10 4 2.25 2200 11250 1.5 0
## # ℹ 14,610 more rows
## # ℹ 11 more variables: no_of_views <dbl>, condition_house <dbl>,
## # grade_house <dbl>, area_house <dbl>, area_basement <dbl>,
## # living_renov <dbl>, lot_renov <dbl>, no_of_schools_nearby <dbl>,
## # distance_airport <dbl>, price <dbl>, log_price <dbl>
(0-7) Check distribution of house price after apply a log transformation by Histogram chart.
ggplot(data=data_lm, mapping = aes(x=log_price)) +
geom_histogram(bin=30, fill = "#D9F588") +
labs(title = "Distribution of Log price") +
theme_minimal()
## Warning in geom_histogram(bin = 30, fill = "#D9F588"): Ignoring unknown
## parameters: `bin`
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Analyzing the histogram of house prices after log transformation revealed a normal, bell-shaped distribution. Therefore, we can use this transformed data to build a linear regression model.
Split data : Create function to split data.
split_func <- function(data, train_size = 0.8) {
set.seed(42)
n <- nrow(data)
id <- sample(1:n,size = n*train_size)
train_data <- data[id, ]
test_data <- data[-id, ]
list(train = train_data, test = test_data)
}
pre_data <- split_func(data_lm)
trainData <- pre_data[[1]]
testData <- pre_data[[2]]
Train Model
(2-1) Train Model : Algorithm Selection -> Linear Regression
# Train Model
set.seed(40)
lmModel <- train(log_price ~ . - price,
data = trainData,
method = "lm")
lmModel
## Linear Regression
##
## 11696 samples
## 16 predictor
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 11696, 11696, 11696, 11696, 11696, 11696, ...
## Resampling results:
##
## RMSE Rsquared MAE
## 0.3258655 0.6185114 0.2622014
##
## Tuning parameter 'intercept' was held constant at a value of TRUE
(2-2) Show Linear Regression Equation.
print("Regression Equation: ")
## [1] "Regression Equation: "
lmModel$finalModel
##
## Call:
## lm(formula = .outcome ~ ., data = dat)
##
## Coefficients:
## (Intercept) no_of_bedrooms no_of_bathrooms
## 1.074e+01 -2.616e-02 -1.031e-02
## living_area lot_area no_of_floors
## 2.759e-04 1.800e-07 8.001e-02
## waterfront no_of_views condition_house
## 3.760e-01 5.577e-02 1.025e-01
## grade_house area_house area_basement
## 1.804e-01 -1.250e-04 NA
## living_renov lot_renov no_of_schools_nearby
## 8.597e-05 -8.637e-07 3.048e-03
## distance_airport
## 2.747e-04
(2-3) Analyze for significant predictors
varImp(lmModel)
## lm variable importance
##
## Overall
## grade_house 100.00000
## living_area 74.04746
## condition_house 53.82683
## area_house 32.91557
## no_of_views 28.94938
## living_renov 27.95058
## no_of_floors 24.83735
## waterfront 23.97325
## no_of_bedrooms 13.53206
## lot_renov 11.63298
## lot_area 2.18204
## no_of_bathrooms 1.86223
## no_of_schools_nearby 0.03208
## distance_airport 0.00000
Score Model
# Predict Unseen data
pTrain_LM <- predict(lmModel, newdata = trainData)
unlog_pTrain_LM <- exp(pTrain_LM)
pTest_LM <- predict(lmModel, newdata = testData)
unlogpTest_LM <- exp(pTest_LM)
Evaluate Model using MAE, MSE, RMSE
# Create Function to calculate MAE
calcu_mae <- function(actual, pred) {
error <- actual - pred
return(mean(abs(error)))
}
# Create Function to calculate MSE
calcu_mse <- function(actual, pred) {
error <- actual - pred
return(mean(error**2))
}
# Create Function to calculate RMSE
calcu_rmse <- function(actual, pred) {
error <- actual - pred
return(sqrt(mean(error**2)))
}
MAETrain <- calcu_mae(trainData$price, unlog_pTrain_LM)
MAETest <- calcu_mae(testData$price, unlogpTest_LM)
MSETrain <- calcu_mse(trainData$price, unlog_pTrain_LM)
MSETest <- calcu_mse(testData$price, unlogpTest_LM)
RMSETrain <- calcu_rmse(trainData$price, unlog_pTrain_LM)
RMSETest <- calcu_rmse(testData$price, unlogpTest_LM)
result <- c("MAE", "MSE", "RMSE")
Train <- c(MAETrain, MSETrain, RMSETrain)
Test <- c(MAETest, MSETest, RMSETest)
result_df <- data.frame(result,Train, Test)
result_df
## result Train Test
## 1 MAE 1.373709e+05 1.431174e+05
## 2 MSE 4.497051e+10 8.886660e+10
## 3 RMSE 2.120625e+05 2.981050e+05
To evaluate the performance of various algorithms, we employ re-sampling techniques, pre-process, hyper-parameter tuning, and a diverse set of machine learning models as follows:
First-step
## create train control
set.seed(42)
ctrl_cv <- trainControl(method = "cv",
number = 8,
verboseIter = TRUE)
## create my_grid
my_grid <- expand.grid(alpha = 0:1,
lambda = seq(0.0005, 0.05, length = 20))
## train model
glmModel_cv <- train(log_price ~ . - price,
data = trainData,
method = "glmnet",
tuneGrid = my_grid,
trControl = ctrl_cv)
## + Fold1: alpha=0, lambda=0.05
## - Fold1: alpha=0, lambda=0.05
## + Fold1: alpha=1, lambda=0.05
## - Fold1: alpha=1, lambda=0.05
## + Fold2: alpha=0, lambda=0.05
## - Fold2: alpha=0, lambda=0.05
## + Fold2: alpha=1, lambda=0.05
## - Fold2: alpha=1, lambda=0.05
## + Fold3: alpha=0, lambda=0.05
## - Fold3: alpha=0, lambda=0.05
## + Fold3: alpha=1, lambda=0.05
## - Fold3: alpha=1, lambda=0.05
## + Fold4: alpha=0, lambda=0.05
## - Fold4: alpha=0, lambda=0.05
## + Fold4: alpha=1, lambda=0.05
## - Fold4: alpha=1, lambda=0.05
## + Fold5: alpha=0, lambda=0.05
## - Fold5: alpha=0, lambda=0.05
## + Fold5: alpha=1, lambda=0.05
## - Fold5: alpha=1, lambda=0.05
## + Fold6: alpha=0, lambda=0.05
## - Fold6: alpha=0, lambda=0.05
## + Fold6: alpha=1, lambda=0.05
## - Fold6: alpha=1, lambda=0.05
## + Fold7: alpha=0, lambda=0.05
## - Fold7: alpha=0, lambda=0.05
## + Fold7: alpha=1, lambda=0.05
## - Fold7: alpha=1, lambda=0.05
## + Fold8: alpha=0, lambda=0.05
## - Fold8: alpha=0, lambda=0.05
## + Fold8: alpha=1, lambda=0.05
## - Fold8: alpha=1, lambda=0.05
## Aggregating results
## Selecting tuning parameters
## Fitting alpha = 1, lambda = 5e-04 on full training set
print("Regularized Regression with K-Fold Cross Validation")
## [1] "Regularized Regression with K-Fold Cross Validation"
print(glmModel_cv)
## glmnet
##
## 11696 samples
## 16 predictor
##
## No pre-processing
## Resampling: Cross-Validated (8 fold)
## Summary of sample sizes: 10233, 10235, 10234, 10233, 10234, 10235, ...
## Resampling results across tuning parameters:
##
## alpha lambda RMSE Rsquared MAE
## 0 0.000500000 0.3257580 0.6167350 0.2628317
## 0 0.003105263 0.3257580 0.6167350 0.2628317
## 0 0.005710526 0.3257580 0.6167350 0.2628317
## 0 0.008315789 0.3257580 0.6167350 0.2628317
## 0 0.010921053 0.3257580 0.6167350 0.2628317
## 0 0.013526316 0.3257580 0.6167350 0.2628317
## 0 0.016131579 0.3257580 0.6167350 0.2628317
## 0 0.018736842 0.3257580 0.6167350 0.2628317
## 0 0.021342105 0.3257580 0.6167350 0.2628317
## 0 0.023947368 0.3257580 0.6167350 0.2628317
## 0 0.026552632 0.3257580 0.6167350 0.2628317
## 0 0.029157895 0.3257580 0.6167350 0.2628317
## 0 0.031763158 0.3257580 0.6167350 0.2628317
## 0 0.034368421 0.3257580 0.6167350 0.2628317
## 0 0.036973684 0.3257580 0.6167350 0.2628317
## 0 0.039578947 0.3258168 0.6166421 0.2628982
## 0 0.042184211 0.3258894 0.6165261 0.2629796
## 0 0.044789474 0.3259654 0.6164048 0.2630629
## 0 0.047394737 0.3260421 0.6162841 0.2631437
## 0 0.050000000 0.3261214 0.6161592 0.2632243
## 1 0.000500000 0.3252103 0.6176166 0.2619671
## 1 0.003105263 0.3253619 0.6173417 0.2621559
## 1 0.005710526 0.3257282 0.6166667 0.2624643
## 1 0.008315789 0.3263328 0.6155134 0.2629347
## 1 0.010921053 0.3269387 0.6144304 0.2634140
## 1 0.013526316 0.3275616 0.6133830 0.2639171
## 1 0.016131579 0.3282355 0.6122821 0.2644610
## 1 0.018736842 0.3289728 0.6110861 0.2650516
## 1 0.021342105 0.3298169 0.6096793 0.2657079
## 1 0.023947368 0.3307558 0.6080807 0.2664212
## 1 0.026552632 0.3316329 0.6066865 0.2670887
## 1 0.029157895 0.3324365 0.6055453 0.2677060
## 1 0.031763158 0.3333017 0.6042961 0.2683570
## 1 0.034368421 0.3342383 0.6029035 0.2690522
## 1 0.036973684 0.3352451 0.6013644 0.2697977
## 1 0.039578947 0.3363186 0.5996818 0.2705851
## 1 0.042184211 0.3373999 0.5980036 0.2713820
## 1 0.044789474 0.3385012 0.5962947 0.2722065
## 1 0.047394737 0.3396028 0.5946100 0.2730367
## 1 0.050000000 0.3407642 0.5927932 0.2739074
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were alpha = 1 and lambda = 5e-04.
## Predict Unseen data
pTrain_glm_cv <- predict(glmModel_cv, newdata = trainData)
unlog_pTrain_glm_cv <- exp(pTrain_glm_cv)
pTest_glm_cv <- predict(glmModel_cv, newdata = testData)
unlog_pTest_glm_cv <- exp(pTest_glm_cv)
MAETrain_glm_cv <- calcu_mae(trainData$price, unlog_pTrain_glm_cv)
MAETest_glm_cv <- calcu_mae(testData$price, unlog_pTest_glm_cv)
MSETrain_glm_cv <- calcu_mse(trainData$price, unlog_pTrain_glm_cv)
MSETest_glm_cv <- calcu_mse(testData$price, unlog_pTest_glm_cv)
RMSETrain_glm_cv <- calcu_rmse(trainData$price, unlog_pTrain_glm_cv)
RMSETest_glm_cv <- calcu_rmse(testData$price, unlog_pTest_glm_cv)
RMSE_of_glmnet <- c("MAE","MSE","RMSE")
Train_glmnet <- c(MAETrain_glm_cv,MSETrain_glm_cv,RMSETrain_glm_cv)
Test_glmnet <- c(MAETest_glm_cv,MSETest_glm_cv,RMSETest_glm_cv)
RMSE_of_glmnet_df <- data.frame(RMSE_of_glmnet,Train_glmnet, Test_glmnet)
RMSE_of_glmnet_df
## RMSE_of_glmnet Train_glmnet Test_glmnet
## 1 MAE 1.373003e+05 1.429573e+05
## 2 MSE 4.476356e+10 8.788811e+10
## 3 RMSE 2.115740e+05 2.964593e+05
Next-step
## create train control
set.seed(42)
ctrl_tree <- trainControl(method = "cv",
number = 8,
verboseIter = TRUE)
## train model
tree_model <- train(log_price ~ . - price,
data = trainData,
method = "rpart",
tuneGrid = expand.grid(cp = c(0.02,0.1,0.25)),
trControl = ctrl_tree)
## + Fold1: cp=0.02
## - Fold1: cp=0.02
## + Fold2: cp=0.02
## - Fold2: cp=0.02
## + Fold3: cp=0.02
## - Fold3: cp=0.02
## + Fold4: cp=0.02
## - Fold4: cp=0.02
## + Fold5: cp=0.02
## - Fold5: cp=0.02
## + Fold6: cp=0.02
## - Fold6: cp=0.02
## + Fold7: cp=0.02
## - Fold7: cp=0.02
## + Fold8: cp=0.02
## - Fold8: cp=0.02
## Aggregating results
## Selecting tuning parameters
## Fitting cp = 0.02 on full training set
print("Decision Tree Model with K-Fold Cross Validation")
## [1] "Decision Tree Model with K-Fold Cross Validation"
print(tree_model)
## CART
##
## 11696 samples
## 16 predictor
##
## No pre-processing
## Resampling: Cross-Validated (8 fold)
## Summary of sample sizes: 10233, 10235, 10234, 10233, 10234, 10235, ...
## Resampling results across tuning parameters:
##
## cp RMSE Rsquared MAE
## 0.02 0.3719093 0.5000840 0.2975066
## 0.10 0.4270923 0.3410929 0.3393977
## 0.25 0.4270923 0.3410929 0.3393977
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was cp = 0.02.
Decision Tree Model Visualization
rpart.plot(tree_model$finalModel)
## Predict Unseen data
pTrain_tree_cv <- predict(tree_model, newdata = trainData)
unlog_pTrain_tree_cv <- exp(pTrain_tree_cv)
pTest_tree_cv <- predict(tree_model, newdata = testData)
unlog_pTest_tree_cv <- exp(pTest_tree_cv)
MAETrain_tree_cv <- calcu_mae(trainData$price, unlog_pTrain_tree_cv)
MAETest_tree_cv <- calcu_mae(testData$price, unlog_pTest_tree_cv)
MSETrain_tree_cv <- calcu_mse(trainData$price, unlog_pTrain_tree_cv)
MSETest_tree_cv <- calcu_mse(testData$price, unlog_pTest_tree_cv)
RMSETrain_tree_cv <- calcu_rmse(trainData$price, unlog_pTrain_tree_cv)
RMSETest_tree_cv <- calcu_rmse(testData$price, unlog_pTest_tree_cv)
RMSE_of_tree <- c("MAE","MSE","RMSE")
Train_tree <- c(MAETrain_tree_cv,MSETrain_tree_cv,RMSETrain_tree_cv)
Test_tree <- c(MAETest_tree_cv,MSETest_tree_cv,RMSETest_tree_cv)
RMSE_of_tree_df <- data.frame(RMSE_of_tree,Train_tree, Test_tree)
RMSE_of_tree_df
## RMSE_of_tree Train_tree Test_tree
## 1 MAE 1.592545e+05 1.654061e+05
## 2 MSE 6.880517e+10 1.004400e+11
## 3 RMSE 2.623074e+05 3.169227e+05
The last-step
set.seed(42)
ctrl_rf_nn <- trainControl(method = "cv",
number = 5,
verboseIter = TRUE)
rf_mod <- train(log_price ~ . - price,
data = trainData,
method = "rf",
tuneLength = 5,
trControl = ctrl_rf_nn)
## + Fold1: mtry= 2
## - Fold1: mtry= 2
## + Fold1: mtry= 5
## - Fold1: mtry= 5
## + Fold1: mtry= 8
## - Fold1: mtry= 8
## + Fold1: mtry=11
## - Fold1: mtry=11
## + Fold1: mtry=15
## - Fold1: mtry=15
## + Fold2: mtry= 2
## - Fold2: mtry= 2
## + Fold2: mtry= 5
## - Fold2: mtry= 5
## + Fold2: mtry= 8
## - Fold2: mtry= 8
## + Fold2: mtry=11
## - Fold2: mtry=11
## + Fold2: mtry=15
## - Fold2: mtry=15
## + Fold3: mtry= 2
## - Fold3: mtry= 2
## + Fold3: mtry= 5
## - Fold3: mtry= 5
## + Fold3: mtry= 8
## - Fold3: mtry= 8
## + Fold3: mtry=11
## - Fold3: mtry=11
## + Fold3: mtry=15
## - Fold3: mtry=15
## + Fold4: mtry= 2
## - Fold4: mtry= 2
## + Fold4: mtry= 5
## - Fold4: mtry= 5
## + Fold4: mtry= 8
## - Fold4: mtry= 8
## + Fold4: mtry=11
## - Fold4: mtry=11
## + Fold4: mtry=15
## - Fold4: mtry=15
## + Fold5: mtry= 2
## - Fold5: mtry= 2
## + Fold5: mtry= 5
## - Fold5: mtry= 5
## + Fold5: mtry= 8
## - Fold5: mtry= 8
## + Fold5: mtry=11
## - Fold5: mtry=11
## + Fold5: mtry=15
## - Fold5: mtry=15
## Aggregating results
## Selecting tuning parameters
## Fitting mtry = 5 on full training set
nn_mod <- train(log_price ~ . - price,
data = trainData,
method = "nnet",
tuneLength = 5,
trControl = ctrl_rf_nn)
## + Fold1: size=1, decay=0e+00
## # weights: 18
## initial value 1507155.247318
## final value 1359992.241451
## converged
## - Fold1: size=1, decay=0e+00
## + Fold1: size=3, decay=0e+00
## # weights: 52
## initial value 1426655.527163
## final value 1359992.241451
## converged
## - Fold1: size=3, decay=0e+00
## + Fold1: size=5, decay=0e+00
## # weights: 86
## initial value 1531567.824620
## final value 1359992.241451
## converged
## - Fold1: size=5, decay=0e+00
## + Fold1: size=7, decay=0e+00
## # weights: 120
## initial value 1543043.807249
## final value 1359992.241451
## converged
## - Fold1: size=7, decay=0e+00
## + Fold1: size=9, decay=0e+00
## # weights: 154
## initial value 1419428.791910
## final value 1359992.241451
## converged
## - Fold1: size=9, decay=0e+00
## + Fold1: size=1, decay=1e-01
## # weights: 18
## initial value 1473678.847898
## iter 10 value 1360203.501623
## final value 1360000.815808
## converged
## - Fold1: size=1, decay=1e-01
## + Fold1: size=3, decay=1e-01
## # weights: 52
## initial value 1451410.429140
## iter 10 value 1360508.784578
## iter 20 value 1359996.992599
## final value 1359996.964749
## converged
## - Fold1: size=3, decay=1e-01
## + Fold1: size=5, decay=1e-01
## # weights: 86
## initial value 1512153.847759
## iter 10 value 1360882.093150
## iter 20 value 1359997.340945
## iter 30 value 1359995.608318
## final value 1359995.572979
## converged
## - Fold1: size=5, decay=1e-01
## + Fold1: size=7, decay=1e-01
## # weights: 120
## initial value 1480904.079958
## iter 10 value 1361278.574123
## iter 20 value 1359998.882897
## iter 30 value 1359995.184049
## iter 30 value 1359995.172920
## iter 30 value 1359995.166099
## final value 1359995.166099
## converged
## - Fold1: size=7, decay=1e-01
## + Fold1: size=9, decay=1e-01
## # weights: 154
## initial value 1494375.330190
## iter 10 value 1359997.367556
## iter 20 value 1359994.611879
## iter 20 value 1359994.598711
## iter 20 value 1359994.587793
## final value 1359994.587793
## converged
## - Fold1: size=9, decay=1e-01
## + Fold1: size=1, decay=1e-02
## # weights: 18
## initial value 1438439.800360
## iter 10 value 1359994.335830
## final value 1359993.380199
## converged
## - Fold1: size=1, decay=1e-02
## + Fold1: size=3, decay=1e-02
## # weights: 52
## initial value 1465845.328076
## iter 10 value 1360094.181803
## iter 20 value 1359993.124771
## final value 1359993.056075
## converged
## - Fold1: size=3, decay=1e-02
## + Fold1: size=5, decay=1e-02
## # weights: 86
## initial value 1514140.999816
## iter 10 value 1360162.883787
## iter 20 value 1359993.125333
## final value 1359992.993430
## converged
## - Fold1: size=5, decay=1e-02
## + Fold1: size=7, decay=1e-02
## # weights: 120
## initial value 1491584.545211
## iter 10 value 1360173.226744
## iter 20 value 1359993.220313
## final value 1359992.851899
## converged
## - Fold1: size=7, decay=1e-02
## + Fold1: size=9, decay=1e-02
## # weights: 154
## initial value 1416958.921643
## iter 10 value 1360287.693559
## iter 20 value 1359993.188179
## final value 1359992.702457
## converged
## - Fold1: size=9, decay=1e-02
## + Fold1: size=1, decay=1e-03
## # weights: 18
## initial value 1513152.753683
## final value 1359996.935393
## converged
## - Fold1: size=1, decay=1e-03
## + Fold1: size=3, decay=1e-03
## # weights: 52
## initial value 1488910.426846
## iter 10 value 1360195.775049
## iter 20 value 1360071.580189
## iter 30 value 1359998.701333
## iter 40 value 1359992.747291
## final value 1359992.376561
## converged
## - Fold1: size=3, decay=1e-03
## + Fold1: size=5, decay=1e-03
## # weights: 86
## initial value 1420542.472176
## iter 10 value 1360567.946159
## iter 20 value 1360005.887898
## iter 30 value 1359997.208807
## final value 1359994.136626
## converged
## - Fold1: size=5, decay=1e-03
## + Fold1: size=7, decay=1e-03
## # weights: 120
## initial value 1494622.766815
## iter 10 value 1361574.325807
## iter 20 value 1360020.130675
## iter 30 value 1360001.928501
## iter 40 value 1359997.982616
## iter 50 value 1359993.303674
## final value 1359993.195294
## converged
## - Fold1: size=7, decay=1e-03
## + Fold1: size=9, decay=1e-03
## # weights: 154
## initial value 1557636.970306
## iter 10 value 1360479.796680
## iter 20 value 1360012.330276
## iter 30 value 1360004.178981
## final value 1359993.399544
## converged
## - Fold1: size=9, decay=1e-03
## + Fold1: size=1, decay=1e-04
## # weights: 18
## initial value 1489341.612471
## iter 10 value 1360617.929519
## iter 20 value 1359999.461195
## iter 30 value 1359992.330717
## final value 1359992.278923
## converged
## - Fold1: size=1, decay=1e-04
## + Fold1: size=3, decay=1e-04
## # weights: 52
## initial value 1479514.873587
## iter 10 value 1360161.060399
## iter 20 value 1359994.555102
## final value 1359992.734925
## converged
## - Fold1: size=3, decay=1e-04
## + Fold1: size=5, decay=1e-04
## # weights: 86
## initial value 1463296.766386
## final value 1359992.580908
## converged
## - Fold1: size=5, decay=1e-04
## + Fold1: size=7, decay=1e-04
## # weights: 120
## initial value 1393255.428699
## iter 10 value 1359992.295836
## final value 1359992.263716
## converged
## - Fold1: size=7, decay=1e-04
## + Fold1: size=9, decay=1e-04
## # weights: 154
## initial value 1523417.545559
## iter 10 value 1359998.418245
## iter 20 value 1359992.312664
## final value 1359992.260109
## converged
## - Fold1: size=9, decay=1e-04
## + Fold2: size=1, decay=0e+00
## # weights: 18
## initial value 1501200.710163
## final value 1359390.832023
## converged
## - Fold2: size=1, decay=0e+00
## + Fold2: size=3, decay=0e+00
## # weights: 52
## initial value 1442846.494886
## final value 1359390.832023
## converged
## - Fold2: size=3, decay=0e+00
## + Fold2: size=5, decay=0e+00
## # weights: 86
## initial value 1472431.606680
## final value 1359390.832023
## converged
## - Fold2: size=5, decay=0e+00
## + Fold2: size=7, decay=0e+00
## # weights: 120
## initial value 1533105.575507
## final value 1359390.832023
## converged
## - Fold2: size=7, decay=0e+00
## + Fold2: size=9, decay=0e+00
## # weights: 154
## initial value 1527154.013171
## final value 1359390.832023
## converged
## - Fold2: size=9, decay=0e+00
## + Fold2: size=1, decay=1e-01
## # weights: 18
## initial value 1506090.758717
## iter 10 value 1359403.520818
## final value 1359399.486415
## converged
## - Fold2: size=1, decay=1e-01
## + Fold2: size=3, decay=1e-01
## # weights: 52
## initial value 1443370.593915
## iter 10 value 1359551.448137
## iter 20 value 1359397.874603
## final value 1359395.554713
## converged
## - Fold2: size=3, decay=1e-01
## + Fold2: size=5, decay=1e-01
## # weights: 86
## initial value 1438548.872547
## iter 10 value 1359765.038078
## iter 20 value 1359396.631038
## iter 30 value 1359395.305518
## iter 40 value 1359394.222632
## final value 1359394.164140
## converged
## - Fold2: size=5, decay=1e-01
## + Fold2: size=7, decay=1e-01
## # weights: 120
## initial value 1439670.613610
## iter 10 value 1361146.215131
## iter 20 value 1359400.645654
## iter 30 value 1359395.015867
## iter 30 value 1359395.009448
## final value 1359393.868630
## converged
## - Fold2: size=7, decay=1e-01
## + Fold2: size=9, decay=1e-01
## # weights: 154
## initial value 1384248.985792
## iter 10 value 1361039.223871
## iter 20 value 1359393.997672
## final value 1359393.187505
## converged
## - Fold2: size=9, decay=1e-01
## + Fold2: size=1, decay=1e-02
## # weights: 18
## initial value 1469953.420501
## iter 10 value 1359796.446629
## iter 20 value 1359397.309023
## iter 30 value 1359393.308747
## final value 1359392.394068
## converged
## - Fold2: size=1, decay=1e-02
## + Fold2: size=3, decay=1e-02
## # weights: 52
## initial value 1514065.920546
## iter 10 value 1359505.086928
## final value 1359391.538554
## converged
## - Fold2: size=3, decay=1e-02
## + Fold2: size=5, decay=1e-02
## # weights: 86
## initial value 1494864.155203
## iter 10 value 1361816.274440
## iter 20 value 1360665.740408
## iter 30 value 1359395.593386
## iter 40 value 1359393.409369
## iter 50 value 1359391.301107
## iter 50 value 1359391.290206
## iter 50 value 1359391.286508
## final value 1359391.286508
## converged
## - Fold2: size=5, decay=1e-02
## + Fold2: size=7, decay=1e-02
## # weights: 120
## initial value 1556365.808330
## iter 10 value 1367556.255477
## iter 20 value 1359391.705302
## final value 1359391.351014
## converged
## - Fold2: size=7, decay=1e-02
## + Fold2: size=9, decay=1e-02
## # weights: 154
## initial value 1413507.255509
## iter 10 value 1359665.081531
## iter 20 value 1359391.613787
## final value 1359391.481816
## converged
## - Fold2: size=9, decay=1e-02
## + Fold2: size=1, decay=1e-03
## # weights: 18
## initial value 1458187.384798
## iter 10 value 1359426.551489
## iter 20 value 1359392.211169
## final value 1359392.138061
## converged
## - Fold2: size=1, decay=1e-03
## + Fold2: size=3, decay=1e-03
## # weights: 52
## initial value 1449110.976249
## iter 10 value 1359789.070909
## iter 20 value 1359399.626330
## final value 1359392.920624
## converged
## - Fold2: size=3, decay=1e-03
## + Fold2: size=5, decay=1e-03
## # weights: 86
## initial value 1439029.859468
## iter 10 value 1359986.118135
## iter 20 value 1359404.116583
## iter 30 value 1359394.165649
## final value 1359392.600703
## converged
## - Fold2: size=5, decay=1e-03
## + Fold2: size=7, decay=1e-03
## # weights: 120
## initial value 1497030.692878
## iter 10 value 1360141.527905
## iter 20 value 1359408.933795
## iter 30 value 1359399.951635
## iter 40 value 1359395.094835
## final value 1359391.627934
## converged
## - Fold2: size=7, decay=1e-03
## + Fold2: size=9, decay=1e-03
## # weights: 154
## initial value 1426995.638231
## iter 10 value 1359419.597340
## iter 20 value 1359390.942511
## iter 20 value 1359390.934052
## iter 20 value 1359390.933627
## final value 1359390.933627
## converged
## - Fold2: size=9, decay=1e-03
## + Fold2: size=1, decay=1e-04
## # weights: 18
## initial value 1474956.170768
## iter 10 value 1359996.690186
## iter 20 value 1359397.821456
## iter 30 value 1359390.916952
## final value 1359390.857526
## converged
## - Fold2: size=1, decay=1e-04
## + Fold2: size=3, decay=1e-04
## # weights: 52
## initial value 1469786.861007
## final value 1359396.745058
## converged
## - Fold2: size=3, decay=1e-04
## + Fold2: size=5, decay=1e-04
## # weights: 86
## initial value 1525746.640843
## iter 10 value 1359497.937962
## iter 20 value 1359392.837020
## final value 1359391.874725
## converged
## - Fold2: size=5, decay=1e-04
## + Fold2: size=7, decay=1e-04
## # weights: 120
## initial value 1530815.518115
## iter 10 value 1359544.241357
## iter 20 value 1359394.508528
## final value 1359393.957636
## converged
## - Fold2: size=7, decay=1e-04
## + Fold2: size=9, decay=1e-04
## # weights: 154
## initial value 1495990.808025
## final value 1359392.358440
## converged
## - Fold2: size=9, decay=1e-04
## + Fold3: size=1, decay=0e+00
## # weights: 18
## initial value 1480810.105373
## final value 1359948.384454
## converged
## - Fold3: size=1, decay=0e+00
## + Fold3: size=3, decay=0e+00
## # weights: 52
## initial value 1471034.423866
## final value 1359948.384454
## converged
## - Fold3: size=3, decay=0e+00
## + Fold3: size=5, decay=0e+00
## # weights: 86
## initial value 1409199.829222
## final value 1359948.384454
## converged
## - Fold3: size=5, decay=0e+00
## + Fold3: size=7, decay=0e+00
## # weights: 120
## initial value 1428832.219376
## final value 1359948.384454
## converged
## - Fold3: size=7, decay=0e+00
## + Fold3: size=9, decay=0e+00
## # weights: 154
## initial value 1529264.706219
## final value 1359948.384454
## converged
## - Fold3: size=9, decay=0e+00
## + Fold3: size=1, decay=1e-01
## # weights: 18
## initial value 1485900.580443
## iter 10 value 1360159.985532
## final value 1359956.973665
## converged
## - Fold3: size=1, decay=1e-01
## + Fold3: size=3, decay=1e-01
## # weights: 52
## initial value 1495921.889551
## iter 10 value 1359970.086557
## iter 20 value 1359955.591328
## final value 1359954.432959
## converged
## - Fold3: size=3, decay=1e-01
## + Fold3: size=5, decay=1e-01
## # weights: 86
## initial value 1419370.876409
## iter 10 value 1360779.251361
## iter 20 value 1359955.733370
## iter 30 value 1359952.100934
## final value 1359951.932775
## converged
## - Fold3: size=5, decay=1e-01
## + Fold3: size=7, decay=1e-01
## # weights: 120
## initial value 1428352.369285
## iter 10 value 1361347.474221
## iter 20 value 1359960.723237
## iter 30 value 1359951.966857
## final value 1359951.297112
## converged
## - Fold3: size=7, decay=1e-01
## + Fold3: size=9, decay=1e-01
## # weights: 154
## initial value 1442326.470110
## iter 10 value 1360841.297918
## iter 20 value 1359952.349724
## final value 1359951.296577
## converged
## - Fold3: size=9, decay=1e-01
## + Fold3: size=1, decay=1e-02
## # weights: 18
## initial value 1464299.247885
## iter 10 value 1362343.595749
## iter 20 value 1359986.035361
## iter 30 value 1359951.151800
## iter 40 value 1359949.716784
## final value 1359949.546862
## converged
## - Fold3: size=1, decay=1e-02
## + Fold3: size=3, decay=1e-02
## # weights: 52
## initial value 1469037.306098
## iter 10 value 1360065.896499
## iter 20 value 1359949.495867
## final value 1359949.210383
## converged
## - Fold3: size=3, decay=1e-02
## + Fold3: size=5, decay=1e-02
## # weights: 86
## initial value 1459784.382816
## iter 10 value 1359949.533921
## final value 1359949.207019
## converged
## - Fold3: size=5, decay=1e-02
## + Fold3: size=7, decay=1e-02
## # weights: 120
## initial value 1505404.465533
## iter 10 value 1360892.846458
## iter 20 value 1359949.438818
## final value 1359948.792969
## converged
## - Fold3: size=7, decay=1e-02
## + Fold3: size=9, decay=1e-02
## # weights: 154
## initial value 1483386.935996
## iter 10 value 1363302.782530
## iter 20 value 1360014.180271
## iter 30 value 1359977.072716
## iter 40 value 1359949.962140
## final value 1359948.722334
## converged
## - Fold3: size=9, decay=1e-02
## + Fold3: size=1, decay=1e-03
## # weights: 18
## initial value 1467421.815735
## iter 10 value 1360389.544872
## iter 20 value 1359955.119745
## final value 1359950.360053
## converged
## - Fold3: size=1, decay=1e-03
## + Fold3: size=3, decay=1e-03
## # weights: 52
## initial value 1478234.918249
## iter 10 value 1361295.135903
## iter 20 value 1359968.499209
## iter 30 value 1359952.483974
## iter 40 value 1359949.817224
## final value 1359949.756492
## converged
## - Fold3: size=3, decay=1e-03
## + Fold3: size=5, decay=1e-03
## # weights: 86
## initial value 1473634.072128
## iter 10 value 1359949.236749
## final value 1359948.465475
## converged
## - Fold3: size=5, decay=1e-03
## + Fold3: size=7, decay=1e-03
## # weights: 120
## initial value 1448453.403755
## iter 10 value 1360157.288074
## iter 20 value 1359966.116546
## final value 1359949.251545
## converged
## - Fold3: size=7, decay=1e-03
## + Fold3: size=9, decay=1e-03
## # weights: 154
## initial value 1426556.713411
## iter 10 value 1359983.438827
## iter 20 value 1359948.736746
## final value 1359948.488823
## converged
## - Fold3: size=9, decay=1e-03
## + Fold3: size=1, decay=1e-04
## # weights: 18
## initial value 1483546.242269
## iter 10 value 1359996.776232
## iter 20 value 1359949.103675
## final value 1359948.640269
## converged
## - Fold3: size=1, decay=1e-04
## + Fold3: size=3, decay=1e-04
## # weights: 52
## initial value 1489179.343256
## iter 10 value 1360155.210286
## iter 20 value 1359951.184951
## final value 1359949.284780
## converged
## - Fold3: size=3, decay=1e-04
## + Fold3: size=5, decay=1e-04
## # weights: 86
## initial value 1455912.344389
## iter 10 value 1359994.373877
## iter 20 value 1359949.642488
## final value 1359949.279875
## converged
## - Fold3: size=5, decay=1e-04
## + Fold3: size=7, decay=1e-04
## # weights: 120
## initial value 1499491.465784
## iter 10 value 1360017.640357
## iter 20 value 1359950.260790
## final value 1359949.701484
## converged
## - Fold3: size=7, decay=1e-04
## + Fold3: size=9, decay=1e-04
## # weights: 154
## initial value 1392264.937486
## iter 10 value 1359948.457712
## final value 1359948.403648
## converged
## - Fold3: size=9, decay=1e-04
## + Fold4: size=1, decay=0e+00
## # weights: 18
## initial value 1481782.439746
## final value 1359902.154238
## converged
## - Fold4: size=1, decay=0e+00
## + Fold4: size=3, decay=0e+00
## # weights: 52
## initial value 1443352.429467
## final value 1359902.154238
## converged
## - Fold4: size=3, decay=0e+00
## + Fold4: size=5, decay=0e+00
## # weights: 86
## initial value 1440313.912696
## final value 1359902.154238
## converged
## - Fold4: size=5, decay=0e+00
## + Fold4: size=7, decay=0e+00
## # weights: 120
## initial value 1441949.993357
## final value 1359902.154238
## converged
## - Fold4: size=7, decay=0e+00
## + Fold4: size=9, decay=0e+00
## # weights: 154
## initial value 1498099.949905
## final value 1359902.154238
## converged
## - Fold4: size=9, decay=0e+00
## + Fold4: size=1, decay=1e-01
## # weights: 18
## initial value 1497400.139492
## iter 10 value 1359918.617823
## iter 20 value 1359910.794798
## final value 1359910.731734
## converged
## - Fold4: size=1, decay=1e-01
## + Fold4: size=3, decay=1e-01
## # weights: 52
## initial value 1475110.805757
## iter 10 value 1361504.583315
## iter 20 value 1359917.108188
## final value 1359907.032083
## converged
## - Fold4: size=3, decay=1e-01
## + Fold4: size=5, decay=1e-01
## # weights: 86
## initial value 1529847.849037
## iter 10 value 1360868.182801
## iter 20 value 1359906.749190
## final value 1359906.508591
## converged
## - Fold4: size=5, decay=1e-01
## + Fold4: size=7, decay=1e-01
## # weights: 120
## initial value 1513271.420715
## iter 10 value 1361774.493372
## iter 20 value 1359908.683403
## iter 30 value 1359905.158078
## final value 1359905.088079
## converged
## - Fold4: size=7, decay=1e-01
## + Fold4: size=9, decay=1e-01
## # weights: 154
## initial value 1544949.618654
## iter 10 value 1360437.686096
## iter 20 value 1359906.149776
## final value 1359905.070221
## converged
## - Fold4: size=9, decay=1e-01
## + Fold4: size=1, decay=1e-02
## # weights: 18
## initial value 1523355.360830
## iter 10 value 1363377.682153
## iter 20 value 1359950.952166
## iter 30 value 1359905.051612
## iter 40 value 1359903.478776
## final value 1359903.316362
## converged
## - Fold4: size=1, decay=1e-02
## + Fold4: size=3, decay=1e-02
## # weights: 52
## initial value 1474645.403722
## iter 10 value 1361269.412384
## iter 20 value 1359935.369151
## iter 30 value 1359904.291134
## iter 30 value 1359904.279402
## final value 1359903.466238
## converged
## - Fold4: size=3, decay=1e-02
## + Fold4: size=5, decay=1e-02
## # weights: 86
## initial value 1448250.581951
## iter 10 value 1360041.556303
## iter 20 value 1359903.105257
## final value 1359902.641505
## converged
## - Fold4: size=5, decay=1e-02
## + Fold4: size=7, decay=1e-02
## # weights: 120
## initial value 1495355.018850
## iter 10 value 1360138.804419
## iter 20 value 1359902.748018
## iter 20 value 1359902.735997
## final value 1359902.637338
## converged
## - Fold4: size=7, decay=1e-02
## + Fold4: size=9, decay=1e-02
## # weights: 154
## initial value 1443848.816310
## iter 10 value 1360247.280304
## final value 1359902.621262
## converged
## - Fold4: size=9, decay=1e-02
## + Fold4: size=1, decay=1e-03
## # weights: 18
## initial value 1456046.191535
## iter 10 value 1360808.380066
## iter 20 value 1359914.280936
## iter 30 value 1359903.734819
## final value 1359903.334931
## converged
## - Fold4: size=1, decay=1e-03
## + Fold4: size=3, decay=1e-03
## # weights: 52
## initial value 1485411.305199
## iter 10 value 1359968.978493
## iter 20 value 1359902.937342
## final value 1359902.707644
## converged
## - Fold4: size=3, decay=1e-03
## + Fold4: size=5, decay=1e-03
## # weights: 86
## initial value 1512836.987935
## iter 10 value 1359904.452008
## final value 1359902.235045
## converged
## - Fold4: size=5, decay=1e-03
## + Fold4: size=7, decay=1e-03
## # weights: 120
## initial value 1481843.384388
## iter 10 value 1359931.787819
## iter 20 value 1359902.869838
## final value 1359902.268565
## converged
## - Fold4: size=7, decay=1e-03
## + Fold4: size=9, decay=1e-03
## # weights: 154
## initial value 1501213.252383
## iter 10 value 1360275.924126
## iter 20 value 1360144.057861
## iter 30 value 1359923.717504
## final value 1359902.272406
## converged
## - Fold4: size=9, decay=1e-03
## + Fold4: size=1, decay=1e-04
## # weights: 18
## initial value 1450515.440699
## iter 10 value 1360052.115714
## iter 20 value 1359903.980957
## final value 1359902.668916
## converged
## - Fold4: size=1, decay=1e-04
## + Fold4: size=3, decay=1e-04
## # weights: 52
## initial value 1429145.271127
## iter 10 value 1359913.844981
## iter 20 value 1359902.348328
## final value 1359902.316395
## converged
## - Fold4: size=3, decay=1e-04
## + Fold4: size=5, decay=1e-04
## # weights: 86
## initial value 1436484.832399
## iter 10 value 1360001.995624
## iter 20 value 1359904.069410
## final value 1359903.041485
## converged
## - Fold4: size=5, decay=1e-04
## + Fold4: size=7, decay=1e-04
## # weights: 120
## initial value 1491902.717798
## iter 10 value 1359974.043549
## iter 20 value 1359904.200177
## final value 1359903.510371
## converged
## - Fold4: size=7, decay=1e-04
## + Fold4: size=9, decay=1e-04
## # weights: 154
## initial value 1502804.706739
## iter 10 value 1360081.372505
## iter 20 value 1359905.507887
## final value 1359904.170732
## converged
## - Fold4: size=9, decay=1e-04
## + Fold5: size=1, decay=0e+00
## # weights: 18
## initial value 1473077.950036
## final value 1359611.642673
## converged
## - Fold5: size=1, decay=0e+00
## + Fold5: size=3, decay=0e+00
## # weights: 52
## initial value 1460946.195765
## final value 1359611.642673
## converged
## - Fold5: size=3, decay=0e+00
## + Fold5: size=5, decay=0e+00
## # weights: 86
## initial value 1505332.564024
## final value 1359611.642673
## converged
## - Fold5: size=5, decay=0e+00
## + Fold5: size=7, decay=0e+00
## # weights: 120
## initial value 1493368.165297
## final value 1359611.642673
## converged
## - Fold5: size=7, decay=0e+00
## + Fold5: size=9, decay=0e+00
## # weights: 154
## initial value 1546232.496242
## final value 1359611.642673
## converged
## - Fold5: size=9, decay=0e+00
## + Fold5: size=1, decay=1e-01
## # weights: 18
## initial value 1465362.228796
## iter 10 value 1359629.375287
## iter 20 value 1359620.290012
## final value 1359620.215925
## converged
## - Fold5: size=1, decay=1e-01
## + Fold5: size=3, decay=1e-01
## # weights: 52
## initial value 1519223.654586
## iter 10 value 1360107.147249
## iter 20 value 1359627.105659
## iter 30 value 1359617.567622
## final value 1359616.366167
## converged
## - Fold5: size=3, decay=1e-01
## + Fold5: size=5, decay=1e-01
## # weights: 86
## initial value 1437986.725349
## iter 10 value 1360322.244272
## iter 20 value 1359616.912749
## iter 30 value 1359615.055946
## final value 1359614.993543
## converged
## - Fold5: size=5, decay=1e-01
## + Fold5: size=7, decay=1e-01
## # weights: 120
## initial value 1525458.893235
## iter 10 value 1360743.080778
## iter 20 value 1359614.910677
## iter 30 value 1359614.267996
## final value 1359614.232594
## converged
## - Fold5: size=7, decay=1e-01
## + Fold5: size=9, decay=1e-01
## # weights: 154
## initial value 1424562.867611
## iter 10 value 1360322.506634
## iter 20 value 1359615.512152
## iter 30 value 1359614.304943
## final value 1359614.245565
## converged
## - Fold5: size=9, decay=1e-01
## + Fold5: size=1, decay=1e-02
## # weights: 18
## initial value 1520951.719167
## iter 10 value 1359667.274246
## final value 1359613.780631
## converged
## - Fold5: size=1, decay=1e-02
## + Fold5: size=3, decay=1e-02
## # weights: 52
## initial value 1554713.680193
## iter 10 value 1360279.860346
## iter 20 value 1359612.998968
## final value 1359612.219812
## converged
## - Fold5: size=3, decay=1e-02
## + Fold5: size=5, decay=1e-02
## # weights: 86
## initial value 1549032.988266
## iter 10 value 1364391.608219
## iter 20 value 1359712.050691
## iter 30 value 1359620.958234
## iter 40 value 1359612.957534
## final value 1359612.667689
## converged
## - Fold5: size=5, decay=1e-02
## + Fold5: size=7, decay=1e-02
## # weights: 120
## initial value 1513634.801816
## iter 10 value 1359613.217025
## iter 20 value 1359612.250910
## final value 1359612.181882
## converged
## - Fold5: size=7, decay=1e-02
## + Fold5: size=9, decay=1e-02
## # weights: 154
## initial value 1520412.755918
## iter 10 value 1359962.742734
## iter 20 value 1359612.206512
## iter 20 value 1359612.198253
## final value 1359612.198253
## converged
## - Fold5: size=9, decay=1e-02
## + Fold5: size=1, decay=1e-03
## # weights: 18
## initial value 1502424.850243
## iter 10 value 1359613.335555
## final value 1359613.032738
## converged
## - Fold5: size=1, decay=1e-03
## + Fold5: size=3, decay=1e-03
## # weights: 52
## initial value 1459092.780304
## iter 10 value 1360145.686462
## iter 20 value 1359622.951806
## iter 30 value 1359613.327494
## final value 1359612.832664
## converged
## - Fold5: size=3, decay=1e-03
## + Fold5: size=5, decay=1e-03
## # weights: 86
## initial value 1488049.511483
## iter 10 value 1360321.513519
## iter 20 value 1359626.873578
## iter 30 value 1359618.313790
## final value 1359611.942911
## converged
## - Fold5: size=5, decay=1e-03
## + Fold5: size=7, decay=1e-03
## # weights: 120
## initial value 1471416.122363
## iter 10 value 1359613.456464
## final value 1359612.108981
## converged
## - Fold5: size=7, decay=1e-03
## + Fold5: size=9, decay=1e-03
## # weights: 154
## initial value 1390767.093319
## final value 1359611.675037
## converged
## - Fold5: size=9, decay=1e-03
## + Fold5: size=1, decay=1e-04
## # weights: 18
## initial value 1508595.591756
## iter 10 value 1360161.325071
## iter 20 value 1359617.984733
## iter 30 value 1359611.720427
## final value 1359611.667113
## converged
## - Fold5: size=1, decay=1e-04
## + Fold5: size=3, decay=1e-04
## # weights: 52
## initial value 1471711.221424
## iter 10 value 1359671.792241
## iter 20 value 1359612.771539
## final value 1359612.202897
## converged
## - Fold5: size=3, decay=1e-04
## + Fold5: size=5, decay=1e-04
## # weights: 86
## initial value 1455602.856598
## iter 10 value 1359683.743447
## iter 20 value 1359613.217790
## final value 1359612.445138
## converged
## - Fold5: size=5, decay=1e-04
## + Fold5: size=7, decay=1e-04
## # weights: 120
## initial value 1443401.262279
## final value 1359612.628972
## converged
## - Fold5: size=7, decay=1e-04
## + Fold5: size=9, decay=1e-04
## # weights: 154
## initial value 1546211.752951
## iter 10 value 1359666.825019
## iter 20 value 1359612.278882
## final value 1359611.660572
## converged
## - Fold5: size=9, decay=1e-04
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,
## : There were missing values in resampled performance measures.
## Aggregating results
## Selecting tuning parameters
## Fitting size = 1, decay = 0 on full training set
## # weights: 18
## initial value 1832341.846574
## final value 1699711.313709
## converged
print("Random Forest Model with K-Fold Cross Validation")
## [1] "Random Forest Model with K-Fold Cross Validation"
print(rf_mod)
## Random Forest
##
## 11696 samples
## 16 predictor
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 9356, 9358, 9357, 9357, 9356
## Resampling results across tuning parameters:
##
## mtry RMSE Rsquared MAE
## 2 0.2935395 0.6926958 0.2332635
## 5 0.2902269 0.6960681 0.2286713
## 8 0.2909088 0.6941460 0.2285693
## 11 0.2915702 0.6925247 0.2286666
## 15 0.2926614 0.6901003 0.2294052
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 5.
print("Neural Network Model with K-Fold Cross Validation")
## [1] "Neural Network Model with K-Fold Cross Validation"
print(nn_mod)
## Neural Network
##
## 11696 samples
## 16 predictor
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 9357, 9357, 9357, 9357, 9356
## Resampling results across tuning parameters:
##
## size decay RMSE Rsquared MAE
## 1 0e+00 12.05505 NaN 12.04357
## 1 1e-04 12.05505 1.493826e-04 12.04357
## 1 1e-03 12.05513 7.360731e-04 12.04366
## 1 1e-02 12.05505 5.420676e-04 12.04357
## 1 1e-01 12.05505 NaN 12.04358
## 3 0e+00 12.05505 NaN 12.04357
## 3 1e-04 12.05505 7.433719e-05 12.04358
## 3 1e-03 12.05505 1.105221e-03 12.04357
## 3 1e-02 12.05505 1.296808e-03 12.04357
## 3 1e-01 12.05505 2.347539e-03 12.04357
## 5 0e+00 12.05505 NaN 12.04357
## 5 1e-04 12.05505 3.843334e-04 12.04357
## 5 1e-03 12.05505 NaN 12.04357
## 5 1e-02 12.05505 7.755151e-04 12.04357
## 5 1e-01 12.05505 2.475860e-03 12.04357
## 7 0e+00 12.05505 NaN 12.04357
## 7 1e-04 12.05505 8.809214e-05 12.04357
## 7 1e-03 12.05505 9.256058e-04 12.04357
## 7 1e-02 12.05505 3.202389e-05 12.04357
## 7 1e-01 12.05505 1.696830e-03 12.04357
## 9 0e+00 12.05505 NaN 12.04357
## 9 1e-04 12.05505 3.132742e-04 12.04357
## 9 1e-03 12.05505 5.880642e-04 12.04357
## 9 1e-02 12.05505 6.748310e-04 12.04357
## 9 1e-01 12.05505 4.487665e-03 12.04357
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were size = 1 and decay = 0.
## Predict Unseen data
pTrain_rf_cv <- predict(rf_mod, newdata = trainData)
unlog_pTrain_rf_cv <- exp(pTrain_rf_cv)
pTest_rf_cv <- predict(rf_mod, newdata = testData)
unlog_pTest_rf_cv <- exp(pTest_rf_cv)
pTrain_nn_cv <- predict(nn_mod, newdata = trainData)
unlog_pTrain_nn_cv <- exp(pTrain_nn_cv)
pTest_nn_cv <- predict(nn_mod, newdata = testData)
unlog_pTest_nn_cv <- exp(pTest_nn_cv)
MAETrain_rf_cv <- calcu_mae(trainData$price, unlog_pTrain_rf_cv)
MAETest_rf_cv <- calcu_mae(testData$price, unlog_pTest_rf_cv)
MSETrain_rf_cv <- calcu_mse(trainData$price, unlog_pTrain_rf_cv)
MSETest_rf_cv <- calcu_mse(testData$price, unlog_pTest_rf_cv)
RMSETrain_rf_cv <- calcu_rmse(trainData$price, unlog_pTrain_rf_cv)
RMSETest_rf_cv <- calcu_rmse(testData$price, unlog_pTest_rf_cv)
MAETrain_nn_cv <- calcu_mae(trainData$price, unlog_pTrain_nn_cv)
MAETest_nn_cv <- calcu_mae(testData$price, unlog_pTest_nn_cv)
MSETrain_nn_cv <- calcu_mse(trainData$price, unlog_pTrain_nn_cv)
MSETest_nn_cv <- calcu_mse(testData$price, unlog_pTest_nn_cv)
RMSETrain_nn_cv <- calcu_rmse(trainData$price, unlog_pTrain_nn_cv)
RMSETest_nn_cv <- calcu_rmse(testData$price, unlog_pTest_nn_cv)
RMSE_of_rf <- c("MAE","MSE","RMSE")
Train_rf <- c(MAETrain_rf_cv,MSETrain_rf_cv,RMSETrain_rf_cv)
Test_rf <- c(MAETest_rf_cv,MSETest_rf_cv,RMSETest_rf_cv)
RMSE_of_nn <- c("MAE","MSE","RMSE")
Train_nn <- c(MAETrain_nn_cv,MSETrain_nn_cv,RMSETrain_nn_cv)
Test_nn <- c(MAETest_nn_cv,MSETest_nn_cv,RMSETest_nn_cv)
RMSE_of_rf_df <- data.frame(RMSE_of_rf,Train_rf, Test_rf)
RMSE_of_nn_df <- data.frame(RMSE_of_nn,Train_nn, Test_nn)
print(RMSE_of_rf_df)
## RMSE_of_rf Train_rf Test_rf
## 1 MAE 5.407660e+04 1.240817e+05
## 2 MSE 9.016143e+09 5.306791e+10
## 3 RMSE 9.495338e+04 2.303647e+05
print(RMSE_of_nn_df)
## RMSE_of_nn Train_nn Test_nn
## 1 MAE 5.372778e+05 5.455362e+05
## 2 MSE 4.175130e+11 4.575273e+11
## 3 RMSE 6.461524e+05 6.764076e+05
Comparison Model with RMSE
comparision <- c("Linear Regression", "Regularized Regression", "Decision Tree", "Random Forest", "Neural Network")
train_rmse <- c(RMSETrain, RMSETrain_glm_cv, RMSETrain_tree_cv, RMSETrain_rf_cv, RMSETrain_nn_cv)
test_rmse <- c(RMSETest, RMSETest_glm_cv, RMSETest_tree_cv, RMSETest_rf_cv, RMSETest_nn_cv)
diff_lm <- abs(RMSETrain-RMSETest)
diff_glmnet <- abs(RMSETrain_glm_cv-RMSETest_glm_cv)
diff_tree <- abs(RMSETrain_tree_cv-RMSETest_tree_cv)
diff_rf <- abs(RMSETrain_rf_cv-RMSETest_rf_cv)
diff_nn <- abs(RMSETrain_nn_cv-RMSETest_nn_cv)
Difference <- c(diff_lm, diff_glmnet, diff_tree, diff_rf, diff_nn)
com_model <- data.frame(comparision, train_rmse, test_rmse,Difference)
print(com_model)
## comparision train_rmse test_rmse Difference
## 1 Linear Regression 212062.51 298105.0 86042.50
## 2 Regularized Regression 211574.01 296459.3 84885.29
## 3 Decision Tree 262307.40 316922.7 54615.28
## 4 Random Forest 94953.38 230364.7 135411.36
## 5 Neural Network 646152.42 676407.6 30255.21
Conclusion
Based on the table above, the model with the smallest difference between
train RMSE and test RMSE, which is the “Neural Network
model”, is preferred for predicting house prices in India.
We will export the optimal model for future applications.
## save model .RDS
saveRDS(nn_mod, "/Users/j.nrup/Documents/Data Project/nn_model.RDS")
This project leverages machine learning for house price prediction by employing various algorithms such as linear regression, regularized regression, decision trees, random forests, and neural networks. It prioritizes data quality through extensive cleansing, preparation, and pre-processing techniques, including log transformation and re-sampling. Additionally, hyper-parameter tuning ensures optimal model performance.
Thank you for your interest. I hope this project will be beneficial to those who are interested. If there are any errors, I apologize in advance. - Narupong Jarasbunpaisarn