CarDekho adalah portal mobil yang membantu penggunanya dalam melakukan pembelian atau penjualan mobil dengan memberikan informasi mengenai harga, spesifikasi, asuransi dan aspek-aspek lain.
Dataset yang digunakan berisi informasi tentang mobil bekas beserta spesifikasinya. Informasi yang dimiliki tersebut digunakan untuk membuat model untuk memprediksi harga penjualan mobil.
Pada studi kasus ini, model yang akan digunakan adalah Random Forest dan Gradient Boosting.
library(stringr)
library(dplyr)
library(caret)
library(ranger)
library(gbm)
library(lime)
library(imputeMissings)
library(MLmetrics)
library(cowplot)
library(skimr)
car <- read.csv("Car details v3.csv", stringsAsFactors = TRUE)
head(car)
## name year selling_price km_driven fuel seller_type
## 1 Maruti Swift Dzire VDI 2014 450000 145500 Diesel Individual
## 2 Skoda Rapid 1.5 TDI Ambition 2014 370000 120000 Diesel Individual
## 3 Honda City 2017-2020 EXi 2006 158000 140000 Petrol Individual
## 4 Hyundai i20 Sportz Diesel 2010 225000 127000 Diesel Individual
## 5 Maruti Swift VXI BSIII 2007 130000 120000 Petrol Individual
## 6 Hyundai Xcent 1.2 VTVT E Plus 2017 440000 45000 Petrol Individual
## transmission owner mileage engine max_power
## 1 Manual First Owner 23.4 kmpl 1248 CC 74 bhp
## 2 Manual Second Owner 21.14 kmpl 1498 CC 103.52 bhp
## 3 Manual Third Owner 17.7 kmpl 1497 CC 78 bhp
## 4 Manual First Owner 23.0 kmpl 1396 CC 90 bhp
## 5 Manual First Owner 16.1 kmpl 1298 CC 88.2 bhp
## 6 Manual First Owner 20.14 kmpl 1197 CC 81.86 bhp
## torque seats
## 1 190Nm@ 2000rpm 5
## 2 250Nm@ 1500-2500rpm 5
## 3 12.7@ 2,700(kgm@ rpm) 5
## 4 22.4 kgm at 1750-2750rpm 5
## 5 11.5@ 4,500(kgm@ rpm) 5
## 6 113.75nm@ 4000rpm 5
skim(car)
Name | car |
Number of rows | 8128 |
Number of columns | 13 |
_______________________ | |
Column type frequency: | |
factor | 9 |
numeric | 4 |
________________________ | |
Group variables | None |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
name | 0 | 1 | FALSE | 2058 | Mar: 129, Mar: 82, Mar: 71, BMW: 62 |
fuel | 0 | 1 | FALSE | 4 | Die: 4402, Pet: 3631, CNG: 57, LPG: 38 |
seller_type | 0 | 1 | FALSE | 3 | Ind: 6766, Dea: 1126, Tru: 236 |
transmission | 0 | 1 | FALSE | 2 | Man: 7078, Aut: 1050 |
owner | 0 | 1 | FALSE | 5 | Fir: 5289, Sec: 2105, Thi: 555, Fou: 174 |
mileage | 0 | 1 | FALSE | 394 | 18.: 225, emp: 221, 19.: 173, 18.: 164 |
engine | 0 | 1 | FALSE | 122 | 124: 1017, 119: 832, 998: 453, 796: 444 |
max_power | 0 | 1 | FALSE | 323 | 74 : 377, 81.: 220, emp: 215, 88.: 204 |
torque | 0 | 1 | FALSE | 442 | 190: 530, 200: 445, 90N: 405, 113: 223 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
year | 0 | 1.00 | 2013.80 | 4.04 | 1983 | 2011 | 2015 | 2017 | 2020 | ▁▁▁▃▇ |
selling_price | 0 | 1.00 | 638271.81 | 806253.40 | 29999 | 254999 | 450000 | 675000 | 10000000 | ▇▁▁▁▁ |
km_driven | 0 | 1.00 | 69819.51 | 56550.55 | 1 | 35000 | 60000 | 98000 | 2360457 | ▇▁▁▁▁ |
seats | 221 | 0.97 | 5.42 | 0.96 | 2 | 5 | 5 | 5 | 14 | ▁▇▂▁▁ |
Berikut adalah deskripsi dari peubah-peubah dalam dataset.
Variabel | Deskripsi |
---|---|
name | Nama mobil |
year | Tahun pembelian mobil |
selling_price | Harga jual mobil (Rp) |
km_driven | Jumlah kilometer yang ditempuh mobil (km) |
fuel | Jenis bahan bakar mobil (CNG/diesel/petrol/LPG) |
seller_type | Tipe penjual (individual/dealer/trustmark dealer) |
transmission | Transmisi mobil (automatic/manual) |
owner | Pemilik sebelumnya (first/second/third/fourth and above owner/test drive car) |
mileage | Jarak tempuh mobil (kmpl, km/kg) |
engine | Kapasitas mesin (CC) |
max_power | Kekuatan mesin (bhp) |
torque | Torsi mobil (kgm, nm) |
seats | Kapasitas tempat duduk |
Berdasarkan output di atas, terdapat kondisi data yang harus ditangani terlebih dahulu sebelum melakukan pemodelan, antara lain:
engine
harus dihilangkan unit CC
dan konversi menjadi numerik. Demikian juga dengan peubah max_power
, torque
dan mileage
.engine
, max_power
, torque
dan mileage
sekaligus mengubah menjadi numerikage
, atau umur mobil sejak diproduksi (year
) hingga dijualowner
menjadi ordinal/ordered factorname
dan year
car2 <- car %>% mutate(engine = as.numeric(str_remove(engine, " CC")),
max_power = as.numeric(str_remove(max_power, " bhp")),
torque = as.numeric(str_extract(torque, "[0-9.]+")),
mileage = as.numeric(str_extract(mileage, "[0-9.]+")),
owner = factor(owner, ordered = TRUE,
levels = c("Test Drive Car",
"First Owner",
"Second Owner",
"Third Owner",
"Fourth & Above Owner")),
age = 2022 - year) %>%
select(-name, -year) %>%
filter(!fuel %in% c('CNG','LPG'))
skim(car2)
Name | car2 |
Number of rows | 8033 |
Number of columns | 12 |
_______________________ | |
Column type frequency: | |
factor | 4 |
numeric | 8 |
________________________ | |
Group variables | None |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
fuel | 0 | 1 | FALSE | 2 | Die: 4402, Pet: 3631, CNG: 0, LPG: 0 |
seller_type | 0 | 1 | FALSE | 3 | Ind: 6673, Dea: 1124, Tru: 236 |
transmission | 0 | 1 | FALSE | 2 | Man: 6983, Aut: 1050 |
owner | 0 | 1 | TRUE | 5 | Fir: 5238, Sec: 2073, Thi: 547, Fou: 170 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
selling_price | 0 | 1.00 | 642736.12 | 809863.53 | 29999.0 | 260000.00 | 450000.0 | 680000.00 | 10000000 | ▇▁▁▁▁ |
km_driven | 0 | 1.00 | 69738.82 | 56643.61 | 1000.0 | 35000.00 | 60000.0 | 98000.00 | 2360457 | ▇▁▁▁▁ |
mileage | 214 | 0.97 | 19.39 | 4.00 | 0.0 | 16.78 | 19.3 | 22.32 | 42 | ▁▃▇▁▁ |
engine | 214 | 0.97 | 1463.09 | 504.66 | 624.0 | 1197.00 | 1248.0 | 1582.00 | 3604 | ▇▇▂▂▁ |
max_power | 208 | 0.97 | 91.86 | 35.85 | 0.0 | 69.00 | 82.4 | 102.00 | 400 | ▇▇▁▁▁ |
torque | 214 | 0.97 | 169.32 | 97.32 | 4.8 | 104.00 | 160.0 | 204.00 | 789 | ▇▆▂▁▁ |
seats | 214 | 0.97 | 5.42 | 0.96 | 2.0 | 5.00 | 5.0 | 5.00 | 14 | ▁▇▂▁▁ |
age | 0 | 1.00 | 8.18 | 4.03 | 2.0 | 5.00 | 7.0 | 11.00 | 39 | ▇▃▁▁▁ |
Data dibagi menjadi dua kelompok, yaitu data train dan data testing
set.seed(123)
train_idx <- createDataPartition(car2$selling_price, p = 0.7, list=FALSE)
trainData <- car2[train_idx,]
testData <- car2[-train_idx,]
Eksplorasi data dilakukan terhadap data training.
selling_price
ggplot(trainData, aes_string(x = "selling_price")) +
geom_histogram(color = "black") +
ggtitle("Sebaran selling_price")
plot_numeric_features <- function(x){
ggplot(trainData, aes_string(x, "selling_price")) +
geom_point() +
geom_smooth(method = "loess", se = F) +
scale_x_continuous(labels = scales::comma) +
ylim(0, NA)
}
plot_grid(
plot_numeric_features("km_driven"),
plot_numeric_features("mileage"),
plot_numeric_features("engine"),
plot_numeric_features("max_power"),
plot_numeric_features("torque"),
plot_numeric_features("seats"),
plot_numeric_features("age"))
Dari plot di atas, terlihat adanya outlier pada peubah km_driven
, torque
, mileage
, max_power
count_categoric_features <- function(x){
ggplot(trainData, aes_string(x = x)) +
geom_bar() +
coord_flip()
}
plot_grid(
count_categoric_features("fuel"),
count_categoric_features("seller_type"),
count_categoric_features("transmission"),
count_categoric_features("owner"))
plot_categoric_features <- function(x){
ggplot(trainData, aes_string(x, "selling_price")) +
geom_boxplot() +
coord_flip()
}
plot_grid(
plot_categoric_features("fuel"),
plot_categoric_features("seller_type"),
plot_categoric_features("transmission"),
plot_categoric_features("owner"))
Dari hasil eksplorasi di atas, terlihat adanya pencilan/outlier pada peubah km_driven
, torque
, mileage
, max_power
. Salah satu penangan pencilan adalah dengan metode capping.
trainData$km_driven[trainData$km_driven > 500000] <- 500000
trainData$torque[trainData$torque > 640] <- 640
trainData$mileage[trainData$mileage > 30] <- 30
trainData$mileage[trainData$mileage < 7] <- 7
trainData$max_power[trainData$max_power > 300] <- 300
Setelah capping:
plot_grid(
plot_numeric_features("km_driven"),
plot_numeric_features("mileage"),
plot_numeric_features("engine"),
plot_numeric_features("max_power"),
plot_numeric_features("torque"),
plot_numeric_features("seats"),
plot_numeric_features("age"))
colSums(is.na(trainData))
## selling_price km_driven fuel seller_type transmission
## 0 0 0 0 0
## owner mileage engine max_power torque
## 0 153 153 151 153
## seats age
## 153 0
Penanganan missing value dilakukan dengan imputasi menggunakan nilai median.
imputer <- compute(trainData)
trainData <- impute(trainData, object=imputer)
Penanganan outlier dan missing value pada data testing menggunakan cara yang sama dengan yang dilakukan terhadap data training.
testData$km_driven[testData$km_driven > 500000] <- 500000
testData$torque[testData$torque > 640] <- 640
testData$mileage[testData$mileage > 30] <- 30
testData$mileage[testData$mileage < 7] <- 7
testData$max_power[testData$max_power > 300] <- 300
testData <- impute(testData, object=imputer)
Model prediksi menggunakan metode Random Forest dan Gradient Boosting. Pemilihan tuning parameter dilakukan menggunakan cross validation atau validasi silang terhadap data testing.
K-fold cross validation adalah salah satu teknik validasi untuk mencari tuning parameter terbaik sekaligus mengevaluasi kinerja model. Pada studi kasus ini digunakan 5-fold cross validation. Data dipartisi secara acak ke dalam lima subset data. Secara bergantian masing-masing subset akan dijadikan sebagai data testing, sementara empat subset data lainnya sebagai data training.
fitControl <- trainControl(
method = "cv",
number = 5,
returnResamp = "all")
tuneLength
Opsi tuneLength
pada fungsi caret::train
akan memilih sejumlah tuning parameter atau kombinasi tuning parameter yang dianggap paling tepat sesuai dengan metode yang dipilih dan data training yang diberikan.
rf <- train(selling_price ~ .,
data = trainData,
method = 'ranger',
tuneLength = 10,
importance = "impurity",
trControl = fitControl,
verbose = FALSE)
rf
## Random Forest
##
## 5625 samples
## 11 predictor
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 4499, 4500, 4501, 4499, 4501
## Resampling results across tuning parameters:
##
## mtry splitrule RMSE Rsquared MAE
## 2 variance 213772.6 0.9356596 106964.34
## 2 extratrees 301402.5 0.8936767 163106.11
## 3 variance 177517.4 0.9496120 81322.99
## 3 extratrees 213997.7 0.9327598 108454.70
## 4 variance 168063.7 0.9535745 74590.55
## 4 extratrees 187898.2 0.9434862 87635.78
## 6 variance 159362.0 0.9584761 71917.16
## 6 extratrees 172006.7 0.9511353 77144.31
## 7 variance 159791.6 0.9579907 71716.43
## 7 extratrees 169827.0 0.9524805 76259.05
## 9 variance 158158.5 0.9591719 71760.02
## 9 extratrees 166360.1 0.9544151 75135.83
## 10 variance 156764.9 0.9599440 71664.99
## 10 extratrees 164567.0 0.9554117 74670.42
## 12 variance 155688.6 0.9606910 71839.15
## 12 extratrees 161160.5 0.9574012 74235.90
## 13 variance 155365.6 0.9609113 71949.26
## 13 extratrees 160527.3 0.9577282 73899.11
## 15 variance 156109.4 0.9607269 72605.46
## 15 extratrees 158631.3 0.9586620 73612.57
##
## Tuning parameter 'min.node.size' was held constant at a value of 5
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were mtry = 13, splitrule = variance
## and min.node.size = 5.
Hasil validasi silang ditampilkan pada plot berikut:
plot(rf, main = "5-Fold Cross Validation Random Forest: tuneLength")
rf_best <- rf$bestTune
rf_best
## mtry splitrule min.node.size
## 17 13 variance 5
Berdasarkan output di atas, tuning parameter terbaik adalah mtry = 13
, splitrule = variance
dan min.node.size = 5
, yang memberikan RMSE = 155365.6, R-squared = 0.9609113 dan MAE = 71949.26.
Re-fit model terhadap seluruh data testing dengan menggunakan tuning parameter terbaik yang diperoleh pada tahap sebelumnya:
rf <- train(selling_price ~ .,
data = trainData,
method = 'ranger',
tuneGrid = rf_best,
importance = "impurity",
verbose = FALSE)
rf
## Random Forest
##
## 5625 samples
## 11 predictor
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 5625, 5625, 5625, 5625, 5625, 5625, ...
## Resampling results:
##
## RMSE Rsquared MAE
## 170626.1 0.9546539 76479.84
##
## Tuning parameter 'mtry' was held constant at a value of 13
## Tuning
## parameter 'splitrule' was held constant at a value of variance
##
## Tuning parameter 'min.node.size' was held constant at a value of 5
rf_result <- rf$results
rf_result
## mtry splitrule min.node.size RMSE Rsquared MAE RMSESD RsquaredSD
## 1 13 variance 5 170626.1 0.9546539 76479.84 26090.26 0.01244749
## MAESD
## 1 2988.774
Diperoleh model dengan RMSE = 170626.1, R-Squared = 0.9546539, MAE = 76479.84.
Untuk menguji kinerja model dalam memprediksi data baru, dilakukan evaluasi terhadap data testing:
eval_test_data <- function(model){
pred <- predict(model, newdata = testData)
mae <- MAE(testData$selling_price, pred)
rmse <- RMSE(testData$selling_price, pred)
R2 <- R2_Score(testData$selling_price, pred)
return(c(RMSE = rmse, R_Squared = R2, MAE = mae))
}
rf_eval <- eval_test_data(rf)
rf_eval
## RMSE R_Squared MAE
## 133429.7287178 0.9736961 68359.0643768
Diperoleh model RMSE = 133429.7287178, R-Squared = 0.9744161, MAE = 68359.0643768.
plot(varImp(rf),
main = "Random Forest Variable Importance" )
Berdasarkan output di atas, tiga peubah terpenting adalah max_power
, year
, dan torque
.
tuneGrid
Opsi tuneGrid
pada fungsi caret::train
memberikan keleluasaan kepada analis untuk menentukan kandidat tuning parameter atau kombinasi tuning parameter yang akan diuji.
tg <- expand.grid(
mtry = seq(2, 14, 2),
splitrule = c("variance","extratrees"),
min.node.size = c(5, 10, 20, 30))
rf_tg <- train(selling_price ~ .,
data = trainData,
method = 'ranger',
tuneGrid = tg,
ntree = 500,
max_deep = 100,
importance = "impurity",
trControl = fitControl,
verbose = FALSE)
rf_tg
## Random Forest
##
## 5625 samples
## 11 predictor
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 4499, 4500, 4501, 4499, 4501
## Resampling results across tuning parameters:
##
## mtry splitrule min.node.size RMSE Rsquared MAE
## 2 variance 5 213719.5 0.9356199 107232.78
## 2 variance 10 215960.2 0.9344849 108196.91
## 2 variance 20 223080.8 0.9295734 110810.77
## 2 variance 30 229674.2 0.9269485 113497.30
## 2 extratrees 5 299006.3 0.8946027 161926.59
## 2 extratrees 10 305095.4 0.8895127 163633.00
## 2 extratrees 20 310988.7 0.8857602 167159.56
## 2 extratrees 30 315782.3 0.8824282 168331.49
## 4 variance 5 166470.6 0.9545531 74390.79
## 4 variance 10 173090.9 0.9511645 76855.73
## 4 variance 20 178524.5 0.9486429 80093.61
## 4 variance 30 187528.6 0.9438466 84089.21
## 4 extratrees 5 186237.3 0.9447330 87496.62
## 4 extratrees 10 195552.1 0.9393176 91107.88
## 4 extratrees 20 207597.5 0.9328367 98206.24
## 4 extratrees 30 219758.8 0.9260411 105351.92
## 6 variance 5 161528.4 0.9570422 72150.55
## 6 variance 10 164558.6 0.9556767 73499.77
## 6 variance 20 171468.5 0.9523911 76800.45
## 6 variance 30 178349.4 0.9488870 80454.62
## 6 extratrees 5 170992.8 0.9521615 77301.40
## 6 extratrees 10 180934.9 0.9466800 81128.21
## 6 extratrees 20 193408.3 0.9402307 87610.05
## 6 extratrees 30 202218.7 0.9357428 93880.73
## 8 variance 5 157412.5 0.9594540 71467.72
## 8 variance 10 160725.8 0.9578757 73043.96
## 8 variance 20 167850.4 0.9542800 76003.49
## 8 variance 30 175147.4 0.9505631 79480.88
## 8 extratrees 5 167060.5 0.9538281 75441.74
## 8 extratrees 10 173436.1 0.9510390 78678.07
## 8 extratrees 20 186242.0 0.9441982 84856.69
## 8 extratrees 30 194723.3 0.9399071 90374.87
## 10 variance 5 156068.6 0.9604042 71599.04
## 10 variance 10 159993.7 0.9583167 73025.63
## 10 variance 20 164791.5 0.9560985 75740.10
## 10 variance 30 173805.7 0.9512770 79360.41
## 10 extratrees 5 164677.5 0.9553732 74815.43
## 10 extratrees 10 169244.7 0.9534254 77270.59
## 10 extratrees 20 180758.9 0.9473169 82783.45
## 10 extratrees 30 190599.5 0.9420836 88229.90
## 12 variance 5 156221.5 0.9605196 71954.99
## 12 variance 10 158122.5 0.9594349 72864.77
## 12 variance 20 163490.0 0.9569222 75584.63
## 12 variance 30 170663.2 0.9531860 79012.69
## 12 extratrees 5 162710.6 0.9565309 74273.74
## 12 extratrees 10 168136.8 0.9538781 76999.62
## 12 extratrees 20 177280.2 0.9493637 81662.42
## 12 extratrees 30 186358.9 0.9446752 86771.80
## 14 variance 5 155720.8 0.9608428 72146.51
## 14 variance 10 157891.0 0.9596549 73187.40
## 14 variance 20 164925.0 0.9560060 76271.57
## 14 variance 30 170144.9 0.9534414 79155.60
## 14 extratrees 5 159110.0 0.9585606 73729.74
## 14 extratrees 10 165865.7 0.9551071 76170.85
## 14 extratrees 20 174566.3 0.9509678 81069.76
## 14 extratrees 30 182733.4 0.9465300 85494.98
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were mtry = 14, splitrule = variance
## and min.node.size = 5.
plot(rf_tg, main = "5-Fold Cross Validation Random Forest: tuneGrid")
rf_tg_best <- rf_tg$bestTune
rf_tg_best
## mtry splitrule min.node.size
## 49 14 variance 5
Berdasarkan output di atas, tuning parameter terbaik adalah mtry = 14
, splitrule = variance
dan min.node.size = 5
, yang memberikan CV RMSE = 155720.8, R-squared = 0.9608428, dan MAE = 72146.51.
Re-fit model terhadap seluruh data testing dengan menggunakan tuning parameter terbaik yang diperoleh pada tahap sebelumnya:
rf_tg <- train(selling_price ~ .,
data = trainData,
method = 'ranger',
tuneGrid = rf_tg_best,
ntree = 500,
max_deep = 100,
importance = "impurity",
verbose = FALSE)
rf_tg
## Random Forest
##
## 5625 samples
## 11 predictor
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 5625, 5625, 5625, 5625, 5625, 5625, ...
## Resampling results:
##
## RMSE Rsquared MAE
## 170450.7 0.9553525 76110
##
## Tuning parameter 'mtry' was held constant at a value of 14
## Tuning
## parameter 'splitrule' was held constant at a value of variance
##
## Tuning parameter 'min.node.size' was held constant at a value of 5
rf_tg_result <- rf_tg$results
rf_tg_result
## mtry splitrule min.node.size RMSE Rsquared MAE RMSESD RsquaredSD
## 1 14 variance 5 170450.7 0.9553525 76110 22332.26 0.009553552
## MAESD
## 1 2886.202
Diperoleh model dengan RMSE = 170450.7, R-Squared = 0.9553525 , MAE= 76110.
Untuk menguji kinerja model dalam memprediksi data baru, dilakukan evaluasi terhadap data testing:
rf_tg_eval <- eval_test_data(rf_tg)
rf_tg_eval
## RMSE R_Squared MAE
## 132655.3068076 0.9740354 68812.6004312
Diperoleh model RMSE = 133429.7287178, R-Squared = 0.9744161, MAE = 68359.0643768.
plot(varImp(rf_tg),
main = "Random Forest Variable Importance")
Berdasarkan output di atas, tiga peubah terpenting adalah max_power
, year
, dan torque
.
Seperti Random Forest, pemilihan tuning parameter pada Gradient Boosting juga menggunakan 5-fold Cross-Validation.
tuneLength
boost <- train(selling_price ~.,
data=trainData,
method="gbm",
tuneLength = 10,
trControl=fitControl,
verbose = FALSE)
boost
## Stochastic Gradient Boosting
##
## 5625 samples
## 11 predictor
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 4501, 4500, 4501, 4499, 4499
## Resampling results across tuning parameters:
##
## interaction.depth n.trees RMSE Rsquared MAE
## 1 50 353932.2 0.8088870 180987.01
## 1 100 329482.2 0.8284958 164065.14
## 1 150 316030.7 0.8399940 160046.83
## 1 200 308852.5 0.8463823 157628.64
## 1 250 304041.4 0.8500792 156027.24
## 1 300 299697.0 0.8541960 154775.10
## 1 350 295475.4 0.8581671 153001.27
## 1 400 292552.1 0.8605509 151942.19
## 1 450 289692.1 0.8631559 150771.79
## 1 500 288342.8 0.8643507 149881.47
## 2 50 244515.2 0.9051000 134075.48
## 2 100 220029.2 0.9209970 118403.50
## 2 150 207982.9 0.9290295 112980.86
## 2 200 201158.7 0.9335218 109185.90
## 2 250 195665.9 0.9369375 106344.55
## 2 300 191886.1 0.9392940 104155.89
## 2 350 187966.0 0.9414978 101933.82
## 2 400 184983.1 0.9431929 100309.54
## 2 450 182133.8 0.9448794 98946.93
## 2 500 179972.5 0.9460673 97575.94
## 3 50 221857.7 0.9211802 121911.09
## 3 100 196155.9 0.9373879 106161.46
## 3 150 185123.9 0.9438828 100757.34
## 3 200 177813.9 0.9479415 96722.24
## 3 250 174656.9 0.9497040 94806.93
## 3 300 171239.1 0.9517271 93060.55
## 3 350 168361.7 0.9532706 91451.29
## 3 400 167176.9 0.9538762 90289.30
## 3 450 165442.1 0.9547364 89253.55
## 3 500 163677.4 0.9556731 88256.67
## 4 50 213102.5 0.9272724 114527.97
## 4 100 189580.4 0.9417015 100827.33
## 4 150 177537.1 0.9484783 94911.81
## 4 200 172159.9 0.9513173 92277.06
## 4 250 167387.8 0.9539768 89488.98
## 4 300 164592.3 0.9554236 87927.09
## 4 350 162285.3 0.9566315 86314.24
## 4 400 159978.2 0.9578424 85088.15
## 4 450 159300.1 0.9581587 84016.93
## 4 500 158143.3 0.9586715 83476.01
## 5 50 199121.6 0.9357471 107133.03
## 5 100 178598.8 0.9477551 94897.47
## 5 150 170473.9 0.9523910 90281.84
## 5 200 166163.3 0.9547029 87702.39
## 5 250 162602.6 0.9566704 85864.19
## 5 300 160446.3 0.9576842 84377.75
## 5 350 159892.8 0.9579925 83352.32
## 5 400 158422.5 0.9587457 82171.25
## 5 450 157407.3 0.9592714 81539.79
## 5 500 156727.2 0.9596837 80693.68
## 6 50 196874.3 0.9371823 102788.01
## 6 100 178323.5 0.9478448 91978.72
## 6 150 169946.2 0.9523706 87703.82
## 6 200 165349.8 0.9546547 84878.56
## 6 250 162348.0 0.9562791 83271.47
## 6 300 160029.1 0.9574737 81576.04
## 6 350 158276.2 0.9583403 80463.04
## 6 400 157229.3 0.9587769 79385.13
## 6 450 156290.2 0.9591502 78469.42
## 6 500 156310.3 0.9591123 77868.29
## 7 50 189686.1 0.9421227 99658.00
## 7 100 171121.2 0.9524527 88927.58
## 7 150 164381.8 0.9559448 84883.15
## 7 200 161249.2 0.9574154 82828.19
## 7 250 158054.5 0.9590605 80969.32
## 7 300 156822.0 0.9597188 79918.89
## 7 350 155994.2 0.9601805 78942.46
## 7 400 154927.1 0.9607059 78010.92
## 7 450 153961.5 0.9611715 77105.21
## 7 500 153266.2 0.9614794 76376.74
## 8 50 182928.6 0.9461617 96765.52
## 8 100 165817.3 0.9553158 86961.70
## 8 150 158092.0 0.9591537 83507.16
## 8 200 154750.3 0.9608479 81048.85
## 8 250 152610.4 0.9616633 79663.55
## 8 300 151843.2 0.9619468 78718.21
## 8 350 150986.7 0.9623560 77687.65
## 8 400 150000.6 0.9628909 76675.68
## 8 450 149616.6 0.9630338 75847.06
## 8 500 148799.1 0.9633404 75014.08
## 9 50 183873.4 0.9452539 95344.70
## 9 100 167567.5 0.9540996 86370.55
## 9 150 160989.8 0.9575176 82848.90
## 9 200 156784.6 0.9594068 80725.93
## 9 250 154552.9 0.9605431 78976.28
## 9 300 152436.6 0.9614612 77214.56
## 9 350 151316.0 0.9619338 76100.09
## 9 400 150231.2 0.9624050 75319.97
## 9 450 150011.7 0.9624641 74756.73
## 9 500 149437.6 0.9627168 74113.90
## 10 50 180333.6 0.9471727 92533.35
## 10 100 165773.0 0.9546764 84845.34
## 10 150 159623.1 0.9577795 81365.00
## 10 200 157211.0 0.9589412 79459.29
## 10 250 155292.4 0.9599751 77741.65
## 10 300 153952.1 0.9606606 76709.36
## 10 350 152925.4 0.9611305 75556.22
## 10 400 152720.0 0.9612442 74755.98
## 10 450 152164.8 0.9614987 74063.25
## 10 500 151937.0 0.9615109 73392.27
##
## Tuning parameter 'shrinkage' was held constant at a value of 0.1
##
## Tuning parameter 'n.minobsinnode' was held constant at a value of 10
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were n.trees = 500, interaction.depth =
## 8, shrinkage = 0.1 and n.minobsinnode = 10.
Hasil validasi silang ditampilkan pada plot berikut:
plot(boost, main = "5-Fold Cross Validation Gradient Boosting: tuneLength")
boost_best <- boost$bestTune
boost_best
## n.trees interaction.depth shrinkage n.minobsinnode
## 80 500 8 0.1 10
Berdasarkan output di atas, tuning parameter terbaik adalah n.trees = 500
, interaction.depth = 8
dan shrinkage = 0.1
dan n.minobsinnode = 10
, yang memberikan CV RMSE = 148799.1, R-squared = 0.9633404, dan MAE = 75014.08.
boost <- train(selling_price ~.,
data=trainData,
method="gbm",
tuneGrid = boost_best,
verbose = FALSE)
boost
## Stochastic Gradient Boosting
##
## 5625 samples
## 11 predictor
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 5625, 5625, 5625, 5625, 5625, 5625, ...
## Resampling results:
##
## RMSE Rsquared MAE
## 170019.4 0.9538613 77077.11
##
## Tuning parameter 'n.trees' was held constant at a value of 500
## Tuning
##
## Tuning parameter 'shrinkage' was held constant at a value of 0.1
##
## Tuning parameter 'n.minobsinnode' was held constant at a value of 10
boost_result <- boost$results
boost_result
## n.trees interaction.depth shrinkage n.minobsinnode RMSE Rsquared
## 1 500 8 0.1 10 170019.4 0.9538613
## MAE RMSESD RsquaredSD MAESD
## 1 77077.11 26278.19 0.01295492 2382.593
Diperoleh model dengan RMSE = 170019.4, R-Squared = 0.9538613 , MAE = 77077.11.
boost_eval <- eval_test_data(boost)
boost_eval
## RMSE R_Squared MAE
## 128386.9747334 0.9760126 73114.2917306
Diperoleh model dengan RMSE = 128386.9747334, R-Squared = 0.9760126 , MAE = 73114.2917306.
plot(varImp(boost),
main = "Gradient Boosting Variable Importance" )
Berdasarkan output di atas, tiga peubah terpenting adalah max_power
, year
, dan torque
.
tuneGrid
tg <- expand.grid(shrinkage = seq(0.1, 0.3, by = 0.1),
interaction.depth = 5:10,
n.minobsinnode = seq(4, 10, 2),
n.trees = c(50, 100, 300, 500))
boost_tg <- train(selling_price ~.,
data=trainData,
method="gbm",
tuneGrid = tg,
trControl=fitControl,
verbose = FALSE)
boost_tg
## Stochastic Gradient Boosting
##
## 5625 samples
## 11 predictor
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 4500, 4500, 4500, 4501, 4499
## Resampling results across tuning parameters:
##
## shrinkage interaction.depth n.minobsinnode n.trees RMSE Rsquared
## 0.1 5 4 50 196307.3 0.9404769
## 0.1 5 4 100 176661.2 0.9507913
## 0.1 5 4 300 162764.7 0.9576155
## 0.1 5 4 500 158353.2 0.9597125
## 0.1 5 6 50 200329.2 0.9374785
## 0.1 5 6 100 182327.0 0.9471212
## 0.1 5 6 300 167069.4 0.9549888
## 0.1 5 6 500 163245.1 0.9569069
## 0.1 5 8 50 204839.7 0.9348194
## 0.1 5 8 100 183989.2 0.9464388
## 0.1 5 8 300 165079.7 0.9562683
## 0.1 5 8 500 160417.9 0.9586685
## 0.1 5 10 50 205419.5 0.9344476
## 0.1 5 10 100 183380.1 0.9469141
## 0.1 5 10 300 166052.4 0.9558622
## 0.1 5 10 500 160663.0 0.9587073
## 0.1 6 4 50 196368.1 0.9397687
## 0.1 6 4 100 179985.2 0.9482586
## 0.1 6 4 300 165579.0 0.9553590
## 0.1 6 4 500 161730.2 0.9573316
## 0.1 6 6 50 195157.5 0.9403826
## 0.1 6 6 100 180439.0 0.9481737
## 0.1 6 6 300 167021.3 0.9547103
## 0.1 6 6 500 163781.8 0.9562549
## 0.1 6 8 50 191268.5 0.9428233
## 0.1 6 8 100 176398.5 0.9504706
## 0.1 6 8 300 162489.3 0.9575390
## 0.1 6 8 500 159494.7 0.9590117
## 0.1 6 10 50 194693.2 0.9402031
## 0.1 6 10 100 177147.6 0.9497220
## 0.1 6 10 300 162695.2 0.9571514
## 0.1 6 10 500 160411.4 0.9583205
## 0.1 7 4 50 185793.3 0.9460370
## 0.1 7 4 100 171762.2 0.9529830
## 0.1 7 4 300 158762.5 0.9592357
## 0.1 7 4 500 155784.6 0.9604090
## 0.1 7 6 50 188246.4 0.9447370
## 0.1 7 6 100 172445.9 0.9526295
## 0.1 7 6 300 160672.0 0.9583460
## 0.1 7 6 500 158422.9 0.9592712
## 0.1 7 8 50 190514.3 0.9432996
## 0.1 7 8 100 173530.8 0.9521849
## 0.1 7 8 300 161633.2 0.9578873
## 0.1 7 8 500 159933.2 0.9585530
## 0.1 7 10 50 189169.2 0.9441540
## 0.1 7 10 100 174052.6 0.9519094
## 0.1 7 10 300 162931.0 0.9574614
## 0.1 7 10 500 160012.8 0.9589405
## 0.1 8 4 50 181644.5 0.9481892
## 0.1 8 4 100 168161.0 0.9548073
## 0.1 8 4 300 159759.5 0.9587787
## 0.1 8 4 500 157890.0 0.9596859
## 0.1 8 6 50 184298.5 0.9467360
## 0.1 8 6 100 169690.9 0.9541448
## 0.1 8 6 300 159652.7 0.9588204
## 0.1 8 6 500 156139.9 0.9605649
## 0.1 8 8 50 189958.8 0.9430985
## 0.1 8 8 100 175180.3 0.9506478
## 0.1 8 8 300 162479.0 0.9569329
## 0.1 8 8 500 158420.3 0.9589827
## 0.1 8 10 50 184881.7 0.9464935
## 0.1 8 10 100 168057.8 0.9550300
## 0.1 8 10 300 155827.7 0.9611751
## 0.1 8 10 500 153832.1 0.9620934
## 0.1 9 4 50 176005.8 0.9513208
## 0.1 9 4 100 164454.5 0.9567348
## 0.1 9 4 300 156141.6 0.9604649
## 0.1 9 4 500 155178.0 0.9608480
## 0.1 9 6 50 182564.6 0.9476639
## 0.1 9 6 100 168310.7 0.9545361
## 0.1 9 6 300 158143.5 0.9593619
## 0.1 9 6 500 156408.0 0.9601486
## 0.1 9 8 50 181187.3 0.9486467
## 0.1 9 8 100 169266.4 0.9543642
## 0.1 9 8 300 158829.2 0.9595024
## 0.1 9 8 500 156838.6 0.9604024
## 0.1 9 10 50 183340.2 0.9471005
## 0.1 9 10 100 167867.2 0.9548862
## 0.1 9 10 300 156243.3 0.9605098
## 0.1 9 10 500 154433.2 0.9613476
## 0.1 10 4 50 174049.4 0.9522637
## 0.1 10 4 100 163406.3 0.9568864
## 0.1 10 4 300 156161.1 0.9602418
## 0.1 10 4 500 155307.0 0.9605497
## 0.1 10 6 50 176709.5 0.9508276
## 0.1 10 6 100 164470.7 0.9567800
## 0.1 10 6 300 155742.6 0.9607143
## 0.1 10 6 500 155036.0 0.9609315
## 0.1 10 8 50 178779.2 0.9495751
## 0.1 10 8 100 165783.1 0.9560195
## 0.1 10 8 300 156562.3 0.9603429
## 0.1 10 8 500 153924.6 0.9615992
## 0.1 10 10 50 178929.5 0.9496403
## 0.1 10 10 100 167767.2 0.9550219
## 0.1 10 10 300 158134.8 0.9596269
## 0.1 10 10 500 156281.1 0.9604551
## 0.2 5 4 50 188099.0 0.9442729
## 0.2 5 4 100 176558.4 0.9503130
## 0.2 5 4 300 168331.8 0.9543105
## 0.2 5 4 500 167094.4 0.9550025
## 0.2 5 6 50 187495.9 0.9436637
## 0.2 5 6 100 175793.9 0.9499660
## 0.2 5 6 300 167376.5 0.9544136
## 0.2 5 6 500 166083.0 0.9549586
## 0.2 5 8 50 181916.6 0.9475893
## 0.2 5 8 100 168446.4 0.9547219
## 0.2 5 8 300 156978.0 0.9606042
## 0.2 5 8 500 154343.7 0.9617844
## 0.2 5 10 50 187787.7 0.9439717
## 0.2 5 10 100 171888.2 0.9526295
## 0.2 5 10 300 161781.2 0.9579573
## 0.2 5 10 500 160038.7 0.9586537
## 0.2 6 4 50 177762.7 0.9499546
## 0.2 6 4 100 173259.4 0.9521443
## 0.2 6 4 300 165939.9 0.9557959
## 0.2 6 4 500 166110.1 0.9555813
## 0.2 6 6 50 188461.2 0.9435009
## 0.2 6 6 100 178340.6 0.9488366
## 0.2 6 6 300 166441.9 0.9549667
## 0.2 6 6 500 166109.1 0.9548908
## 0.2 6 8 50 177273.4 0.9500532
## 0.2 6 8 100 165137.0 0.9563295
## 0.2 6 8 300 158422.2 0.9595784
## 0.2 6 8 500 157580.6 0.9598239
## 0.2 6 10 50 176971.4 0.9498840
## 0.2 6 10 100 166435.3 0.9554482
## 0.2 6 10 300 157058.6 0.9600547
## 0.2 6 10 500 155006.1 0.9610268
## 0.2 7 4 50 175135.6 0.9511733
## 0.2 7 4 100 169523.8 0.9540443
## 0.2 7 4 300 162810.1 0.9572828
## 0.2 7 4 500 161116.1 0.9580107
## 0.2 7 6 50 179700.8 0.9484685
## 0.2 7 6 100 170462.8 0.9533379
## 0.2 7 6 300 165604.4 0.9551890
## 0.2 7 6 500 164585.1 0.9556326
## 0.2 7 8 50 174176.4 0.9512727
## 0.2 7 8 100 166320.8 0.9553275
## 0.2 7 8 300 159416.9 0.9586951
## 0.2 7 8 500 156786.3 0.9600180
## 0.2 7 10 50 182991.9 0.9469611
## 0.2 7 10 100 171489.9 0.9529910
## 0.2 7 10 300 162506.1 0.9574660
## 0.2 7 10 500 160671.6 0.9583819
## 0.2 8 4 50 173098.9 0.9524981
## 0.2 8 4 100 165808.3 0.9562156
## 0.2 8 4 300 160607.9 0.9586605
## 0.2 8 4 500 160453.5 0.9587018
## 0.2 8 6 50 169951.4 0.9542789
## 0.2 8 6 100 164810.2 0.9564917
## 0.2 8 6 300 160714.9 0.9582608
## 0.2 8 6 500 160618.9 0.9582917
## 0.2 8 8 50 169781.1 0.9538167
## 0.2 8 8 100 161511.3 0.9580041
## 0.2 8 8 300 156034.1 0.9606680
## 0.2 8 8 500 154875.8 0.9611121
## 0.2 8 10 50 171485.9 0.9533687
## 0.2 8 10 100 163446.2 0.9571502
## 0.2 8 10 300 157364.3 0.9600219
## 0.2 8 10 500 156470.7 0.9604478
## 0.2 9 4 50 170457.7 0.9527794
## 0.2 9 4 100 163918.7 0.9560411
## 0.2 9 4 300 162252.0 0.9567823
## 0.2 9 4 500 161407.9 0.9572816
## 0.2 9 6 50 170978.8 0.9534450
## 0.2 9 6 100 162679.9 0.9575363
## 0.2 9 6 300 157040.4 0.9599326
## 0.2 9 6 500 156643.3 0.9600685
## 0.2 9 8 50 170136.5 0.9541742
## 0.2 9 8 100 161846.3 0.9584849
## 0.2 9 8 300 156031.2 0.9614348
## 0.2 9 8 500 156174.4 0.9613745
## 0.2 9 10 50 175829.4 0.9507858
## 0.2 9 10 100 169790.3 0.9536737
## 0.2 9 10 300 163936.8 0.9564670
## 0.2 9 10 500 163736.3 0.9565114
## 0.2 10 4 50 170388.9 0.9533860
## 0.2 10 4 100 163280.2 0.9569056
## 0.2 10 4 300 161315.3 0.9575450
## 0.2 10 4 500 161191.6 0.9575613
## 0.2 10 6 50 166759.3 0.9559737
## 0.2 10 6 100 162300.9 0.9578643
## 0.2 10 6 300 158801.6 0.9592248
## 0.2 10 6 500 158436.9 0.9593905
## 0.2 10 8 50 169194.0 0.9543868
## 0.2 10 8 100 164262.8 0.9568156
## 0.2 10 8 300 161433.5 0.9582042
## 0.2 10 8 500 160691.7 0.9585000
## 0.2 10 10 50 166220.3 0.9560718
## 0.2 10 10 100 160667.6 0.9587650
## 0.2 10 10 300 155695.4 0.9611198
## 0.2 10 10 500 154986.2 0.9613809
## 0.3 5 4 50 184346.9 0.9460275
## 0.3 5 4 100 174814.6 0.9510703
## 0.3 5 4 300 170675.9 0.9530605
## 0.3 5 4 500 169468.5 0.9536073
## 0.3 5 6 50 184188.8 0.9456246
## 0.3 5 6 100 177759.4 0.9490570
## 0.3 5 6 300 168740.0 0.9538290
## 0.3 5 6 500 167276.9 0.9543407
## 0.3 5 8 50 186159.7 0.9447834
## 0.3 5 8 100 177443.9 0.9498493
## 0.3 5 8 300 171013.3 0.9532626
## 0.3 5 8 500 169794.4 0.9538071
## 0.3 5 10 50 178128.3 0.9493592
## 0.3 5 10 100 170731.6 0.9533191
## 0.3 5 10 300 162826.7 0.9573212
## 0.3 5 10 500 160244.1 0.9587018
## 0.3 6 4 50 176067.5 0.9507431
## 0.3 6 4 100 165264.0 0.9564027
## 0.3 6 4 300 161399.8 0.9580272
## 0.3 6 4 500 159719.0 0.9588244
## 0.3 6 6 50 181578.3 0.9474665
## 0.3 6 6 100 175740.5 0.9503897
## 0.3 6 6 300 168636.7 0.9537461
## 0.3 6 6 500 169403.2 0.9532151
## 0.3 6 8 50 178603.6 0.9488548
## 0.3 6 8 100 170268.1 0.9533565
## 0.3 6 8 300 165788.0 0.9560229
## 0.3 6 8 500 163697.2 0.9571213
## 0.3 6 10 50 179390.2 0.9483946
## 0.3 6 10 100 173451.3 0.9517958
## 0.3 6 10 300 167391.6 0.9545357
## 0.3 6 10 500 165535.6 0.9554454
## 0.3 7 4 50 176376.6 0.9496554
## 0.3 7 4 100 171904.5 0.9519423
## 0.3 7 4 300 167132.2 0.9542288
## 0.3 7 4 500 167813.3 0.9540202
## 0.3 7 6 50 178445.4 0.9492020
## 0.3 7 6 100 171258.7 0.9529095
## 0.3 7 6 300 167668.3 0.9548348
## 0.3 7 6 500 168144.2 0.9545396
## 0.3 7 8 50 167118.1 0.9550597
## 0.3 7 8 100 165787.3 0.9553862
## 0.3 7 8 300 161840.1 0.9574336
## 0.3 7 8 500 162561.8 0.9569184
## 0.3 7 10 50 176261.0 0.9501417
## 0.3 7 10 100 170047.2 0.9533623
## 0.3 7 10 300 165573.3 0.9556956
## 0.3 7 10 500 165016.4 0.9559992
## 0.3 8 4 50 170652.2 0.9533753
## 0.3 8 4 100 167675.2 0.9547935
## 0.3 8 4 300 166768.6 0.9549664
## 0.3 8 4 500 167258.9 0.9545956
## 0.3 8 6 50 167939.4 0.9549261
## 0.3 8 6 100 162222.8 0.9576935
## 0.3 8 6 300 160300.3 0.9583898
## 0.3 8 6 500 161253.6 0.9579960
## 0.3 8 8 50 174350.7 0.9513345
## 0.3 8 8 100 167575.2 0.9548729
## 0.3 8 8 300 163856.5 0.9567040
## 0.3 8 8 500 164038.2 0.9566456
## 0.3 8 10 50 173883.8 0.9520197
## 0.3 8 10 100 167792.8 0.9550952
## 0.3 8 10 300 163863.1 0.9569850
## 0.3 8 10 500 162641.1 0.9575124
## 0.3 9 4 50 165702.5 0.9563560
## 0.3 9 4 100 164232.1 0.9568577
## 0.3 9 4 300 163497.0 0.9569667
## 0.3 9 4 500 163895.3 0.9568420
## 0.3 9 6 50 177818.8 0.9492300
## 0.3 9 6 100 170533.3 0.9531712
## 0.3 9 6 300 168151.8 0.9544569
## 0.3 9 6 500 167288.3 0.9549869
## 0.3 9 8 50 167602.4 0.9555603
## 0.3 9 8 100 159024.8 0.9598804
## 0.3 9 8 300 156055.9 0.9610038
## 0.3 9 8 500 156214.3 0.9608937
## 0.3 9 10 50 171855.1 0.9528562
## 0.3 9 10 100 165331.7 0.9562324
## 0.3 9 10 300 163149.3 0.9573117
## 0.3 9 10 500 163012.1 0.9573940
## 0.3 10 4 50 173883.9 0.9510384
## 0.3 10 4 100 171419.1 0.9521556
## 0.3 10 4 300 170171.6 0.9525600
## 0.3 10 4 500 170326.6 0.9524267
## 0.3 10 6 50 172096.3 0.9528586
## 0.3 10 6 100 169347.4 0.9539823
## 0.3 10 6 300 166497.3 0.9551988
## 0.3 10 6 500 166760.2 0.9551648
## 0.3 10 8 50 169761.8 0.9539478
## 0.3 10 8 100 166351.5 0.9555750
## 0.3 10 8 300 165716.4 0.9559250
## 0.3 10 8 500 165166.4 0.9563427
## 0.3 10 10 50 168429.4 0.9542629
## 0.3 10 10 100 165522.8 0.9557918
## 0.3 10 10 300 163252.4 0.9572072
## 0.3 10 10 500 163225.8 0.9570931
## MAE
## 108547.51
## 95857.58
## 83692.06
## 78705.96
## 108758.64
## 96337.82
## 85077.68
## 80421.65
## 108928.67
## 96655.99
## 84863.88
## 80553.55
## 109513.26
## 96907.33
## 85705.52
## 81644.60
## 105783.26
## 93697.47
## 81652.01
## 77432.22
## 104071.57
## 93017.65
## 81754.92
## 77515.52
## 103069.73
## 92076.91
## 82515.24
## 79041.36
## 102988.06
## 92378.28
## 82664.06
## 79165.12
## 101176.34
## 90217.23
## 79440.98
## 75091.31
## 99616.30
## 89114.75
## 79550.90
## 76189.47
## 101210.74
## 90320.09
## 80952.74
## 77391.52
## 99715.60
## 90258.38
## 81940.36
## 78576.47
## 97787.00
## 87529.82
## 77954.00
## 74342.26
## 97772.64
## 88166.33
## 79199.99
## 75452.75
## 97545.00
## 87836.64
## 78805.50
## 75115.17
## 97552.82
## 87865.43
## 79803.39
## 76683.23
## 94418.40
## 85716.17
## 76451.92
## 73796.50
## 96032.21
## 86691.33
## 77822.94
## 74553.12
## 94805.36
## 86521.62
## 78279.85
## 75491.28
## 95261.05
## 86851.24
## 78211.54
## 74975.55
## 93600.98
## 84782.92
## 76197.85
## 73290.20
## 93491.92
## 85466.96
## 76668.40
## 74024.17
## 93562.78
## 85741.43
## 77549.35
## 74604.68
## 92300.34
## 85236.39
## 77571.65
## 74930.04
## 99369.87
## 90003.76
## 79862.92
## 76264.76
## 98994.40
## 90267.03
## 80088.11
## 76181.95
## 97382.74
## 90258.35
## 80225.01
## 76598.37
## 97683.15
## 90083.78
## 81558.15
## 78146.12
## 94121.50
## 87003.37
## 78120.19
## 75932.27
## 95851.32
## 88013.39
## 78031.20
## 75188.42
## 93622.03
## 86785.33
## 79016.96
## 75863.52
## 92989.97
## 86882.95
## 79043.81
## 75693.24
## 91511.66
## 84995.36
## 76667.00
## 74417.08
## 92477.62
## 86504.21
## 77741.31
## 75016.75
## 91932.47
## 86173.22
## 77920.57
## 74826.65
## 92906.51
## 86210.41
## 78989.44
## 75945.57
## 90590.67
## 84051.09
## 76202.73
## 75042.85
## 88751.96
## 83744.32
## 75961.50
## 74228.74
## 88997.21
## 83592.71
## 75878.43
## 73777.63
## 90294.58
## 84568.42
## 76962.52
## 74438.01
## 87867.38
## 81711.65
## 75508.35
## 74258.10
## 88387.79
## 82708.90
## 75345.38
## 73840.68
## 88393.32
## 83493.57
## 76357.56
## 74767.80
## 90393.85
## 84259.31
## 76968.53
## 74584.94
## 87178.92
## 81284.50
## 74756.18
## 73674.41
## 87213.88
## 82305.41
## 75002.55
## 73820.07
## 86276.38
## 81719.14
## 75070.02
## 73833.37
## 86575.09
## 81752.39
## 75197.53
## 73977.51
## 95199.61
## 88550.99
## 79398.94
## 77601.31
## 96689.87
## 89927.70
## 80370.90
## 77953.84
## 95882.10
## 89186.36
## 80871.32
## 77369.85
## 95407.70
## 89446.90
## 80310.99
## 77045.25
## 93040.58
## 85208.87
## 78368.32
## 76544.96
## 93419.08
## 87871.70
## 78119.97
## 77068.28
## 92658.94
## 86683.88
## 79650.81
## 77017.74
## 92916.27
## 87552.95
## 80319.80
## 77612.73
## 90045.16
## 84401.79
## 76245.75
## 76151.01
## 90615.80
## 84504.63
## 76603.61
## 75934.83
## 89733.90
## 85020.46
## 77884.48
## 76863.98
## 90857.42
## 86107.73
## 78274.06
## 76560.07
## 87694.86
## 82217.51
## 76321.47
## 75836.59
## 86962.16
## 81488.26
## 75919.06
## 75753.82
## 88351.59
## 83028.92
## 76613.55
## 75638.85
## 89257.68
## 85120.34
## 77637.14
## 76261.92
## 86599.35
## 81045.57
## 76536.21
## 76445.80
## 87467.54
## 82175.83
## 76865.02
## 76158.62
## 87129.89
## 80861.12
## 75350.28
## 74961.45
## 88263.95
## 82596.85
## 77407.11
## 76123.23
## 87682.44
## 82187.75
## 76913.50
## 76407.36
## 86957.60
## 82012.60
## 76892.46
## 76929.77
## 87378.82
## 82624.70
## 77811.30
## 77276.32
## 86637.72
## 82114.83
## 77081.52
## 76330.14
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were n.trees = 500, interaction.depth =
## 8, shrinkage = 0.1 and n.minobsinnode = 10.
plot(boost_tg, main = "5-Fold Cross Validation Gradient Boosting: tuneGrid")
boost_tg_best <- boost_tg$bestTune
boost_tg_best
## n.trees interaction.depth shrinkage n.minobsinnode
## 64 500 8 0.1 10
Berdasarkan output di atas, tuning parameter terbaik adalah n.trees = 500
, interaction.depth = 8
dan shrinkage = 0.1
dan n.minobsinnode = 10
, yang memberikan CV RMSE = 148799.1, R-squared = 0.9633404, dan MAE = 75014.08.
boost_tg <- train(selling_price ~.,
data=trainData,
method="gbm",
tuneGrid = boost_tg_best,
verbose = FALSE)
boost_tg
## Stochastic Gradient Boosting
##
## 5625 samples
## 11 predictor
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 5625, 5625, 5625, 5625, 5625, 5625, ...
## Resampling results:
##
## RMSE Rsquared MAE
## 169333.9 0.9549749 77600.37
##
## Tuning parameter 'n.trees' was held constant at a value of 500
## Tuning
##
## Tuning parameter 'shrinkage' was held constant at a value of 0.1
##
## Tuning parameter 'n.minobsinnode' was held constant at a value of 10
boost_tg_result <- boost_tg$results
boost_tg_result
## n.trees interaction.depth shrinkage n.minobsinnode RMSE Rsquared
## 1 500 8 0.1 10 169333.9 0.9549749
## MAE RMSESD RsquaredSD MAESD
## 1 77600.37 31811.25 0.01549969 3115.737
Diperoleh model dengan RMSE = 169333.9, R-Squared = 0.9549749, MAE = 77600.37.
boost_tg_eval <- eval_test_data(boost_tg)
boost_tg_eval
## RMSE R_Squared MAE
## 127495.2728822 0.9763908 72281.3991466
Diperoleh model dengan RMSE = 127495.2728822, R-Squared = 0.9763908, MAE = 72281.3991466.
plot(varImp(boost_tg),
main = "Gradient Boosting Variable Importance" )
Berdasarkan output di atas, tiga peubah terpenting adalah max_power
, year
, dan torque
.
Ada empat model yang terbentuk. Untuk memilih model terbaik, dilakukan perbandingan sebagai berikut:
eval_all <- matrix(c(rf_eval, rf_tg_eval, boost_eval, boost_tg_eval), nrow = 4, byrow = T)
colnames(eval_all) <- names(rf_eval)
row.names(eval_all) <- c("Random Forest",
"Random Forest tuneGrid",
"Gradient Boosting",
"Gradient Boosting tuneGrid")
eval_all
## RMSE R_Squared MAE
## Random Forest 133429.7 0.9736961 68359.06
## Random Forest tuneGrid 132655.3 0.9740354 68812.60
## Gradient Boosting 128387.0 0.9760126 73114.29
## Gradient Boosting tuneGrid 127495.3 0.9763908 72281.40
Berdasarkan output di atas, kedua model gradient boosting mempunyai kinerja lebih baik terhadap data testing (RMSE terkecil dan R-Squared terbesar). Selain itu, gradient boosting dengan tunegrid juga lebih baik. Karena itu dipilih model Gradient Boosting dengan tuneGrid sebagai model terbaik, dengan RMSE = 127495.3, dan R-Squared = 0.9763908.
(Opsional) Perbandingan kinerja model juga dapat dilakukan menggunakan resample sebagai berikut:
modelcompare <- resamples(list(random_forest=rf,
gradient_boosting=boost,
random_forest_tuneGrid=rf_tg,
gradient_boosting_tuneGrid=boost_tg))
summary(modelcompare)
##
## Call:
## summary.resamples(object = modelcompare)
##
## Models: random_forest, gradient_boosting, random_forest_tuneGrid, gradient_boosting_tuneGrid
## Number of resamples: 25
##
## MAE
## Min. 1st Qu. Median Mean 3rd Qu.
## random_forest 70805.76 74631.68 76524.96 76479.84 78418.66
## gradient_boosting 71923.38 75959.51 76620.00 77077.11 78038.03
## random_forest_tuneGrid 71166.12 74726.01 75879.77 76110.00 77942.42
## gradient_boosting_tuneGrid 73127.07 74937.24 78274.57 77600.37 79139.11
## Max. NA's
## random_forest 82058.86 0
## gradient_boosting 82069.33 0
## random_forest_tuneGrid 85173.11 0
## gradient_boosting_tuneGrid 85689.09 0
##
## RMSE
## Min. 1st Qu. Median Mean 3rd Qu.
## random_forest 128820.9 148291.0 172351.8 170626.1 194107.7
## gradient_boosting 118795.5 149125.6 174554.9 170019.4 188899.4
## random_forest_tuneGrid 125802.8 156065.8 169791.8 170450.7 185236.2
## gradient_boosting_tuneGrid 124809.7 142504.7 154911.8 169333.9 199548.0
## Max. NA's
## random_forest 221037.0 0
## gradient_boosting 213839.7 0
## random_forest_tuneGrid 227287.2 0
## gradient_boosting_tuneGrid 214464.3 0
##
## Rsquared
## Min. 1st Qu. Median Mean 3rd Qu.
## random_forest 0.9315854 0.9458145 0.9542018 0.9546539 0.9675315
## gradient_boosting 0.9295258 0.9434964 0.9546076 0.9538613 0.9647091
## random_forest_tuneGrid 0.9310714 0.9494371 0.9575624 0.9553525 0.9617174
## gradient_boosting_tuneGrid 0.9238403 0.9407797 0.9624515 0.9549749 0.9689030
## Max. NA's
## random_forest 0.9731550 0
## gradient_boosting 0.9766467 0
## random_forest_tuneGrid 0.9745691 0
## gradient_boosting_tuneGrid 0.9768688 0
dotplot(modelcompare, main = "Komparasi Model")
Berdasarkan teknik resample, keempat model memberikan kinerja yang relatif sama.
pred <- predict(rf, newdata = testData)
explainer <- lime(x = subset(trainData, select = -c(selling_price)),
model = boost)
set.seed(123)
explanation <- explain(x = head(subset(testData, select = -c(selling_price))[pred>750000,],2),
explainer, n_features = 10)
plot_features(explanation)
Misalnya dengan bugdet >750K, pembeli dapat memperoleh mobil bekas yang berumur di bawah 5 tahun, dengan power besar (> 101), torsi besar (>110) dengan total km rendah (<35000Km).
set.seed(123)
explanation <- explain(x = head(subset(testData, select = -c(selling_price))[pred<150000,],2),
explainer, n_features = 10)
plot_features(explanation)
Sebaliknya, dengan bugdet <150K, pembeli dapat memperoleh mobil bekas yang berumur di diatas 11 tahun, dengan power kecil bertransmisi manual, dengan total jarak tempuh >60000km.
Random forest dan gradient boosting memberikan kinerja yang relatif sama untuk studi kasus ini.
Variabel terpenting: max_power
, age
, torque
, km_driven
.
max_power
atau power mesin maksimum (bhp): semakin tinggi maka semakin mahalage
atau umur kendaraan: semakin lama, maka semakin murahtorque
atau torsi maksimal: semakin tinggi, semakin mahalkm_driven
atau jarak yang sudah ditempuh: semakin jauh, semakin murahNIM: G1501211024. Email: anisanurizki@apps.ipb.ac.id↩︎
NIM: G1501211061. Email: nur.andi@apps.ipb.ac.id↩︎
NIM: G14180064. Email: nabila_ghoni@apps.ipb.ac.id↩︎
NIM: G94180016. Email: irfan_ivl25@apps.ipb.ac.id↩︎