Studi Kasus

CarDekho adalah portal mobil yang membantu penggunanya dalam melakukan pembelian atau penjualan mobil dengan memberikan informasi mengenai harga, spesifikasi, asuransi dan aspek-aspek lain.

Dataset yang digunakan berisi informasi tentang mobil bekas beserta spesifikasinya. Informasi yang dimiliki tersebut digunakan untuk membuat model untuk memprediksi harga penjualan mobil.

Pada studi kasus ini, model yang akan digunakan adalah Random Forest dan Gradient Boosting.

Loading Packages

library(stringr)
library(dplyr)
library(caret)
library(ranger)
library(gbm)
library(lime)
library(imputeMissings)
library(MLmetrics)
library(cowplot)
library(skimr)

Data

car <- read.csv("Car details v3.csv", stringsAsFactors = TRUE)
head(car)
##                            name year selling_price km_driven   fuel seller_type
## 1        Maruti Swift Dzire VDI 2014        450000    145500 Diesel  Individual
## 2  Skoda Rapid 1.5 TDI Ambition 2014        370000    120000 Diesel  Individual
## 3      Honda City 2017-2020 EXi 2006        158000    140000 Petrol  Individual
## 4     Hyundai i20 Sportz Diesel 2010        225000    127000 Diesel  Individual
## 5        Maruti Swift VXI BSIII 2007        130000    120000 Petrol  Individual
## 6 Hyundai Xcent 1.2 VTVT E Plus 2017        440000     45000 Petrol  Individual
##   transmission        owner    mileage  engine  max_power
## 1       Manual  First Owner  23.4 kmpl 1248 CC     74 bhp
## 2       Manual Second Owner 21.14 kmpl 1498 CC 103.52 bhp
## 3       Manual  Third Owner  17.7 kmpl 1497 CC     78 bhp
## 4       Manual  First Owner  23.0 kmpl 1396 CC     90 bhp
## 5       Manual  First Owner  16.1 kmpl 1298 CC   88.2 bhp
## 6       Manual  First Owner 20.14 kmpl 1197 CC  81.86 bhp
##                     torque seats
## 1           190Nm@ 2000rpm     5
## 2      250Nm@ 1500-2500rpm     5
## 3    12.7@ 2,700(kgm@ rpm)     5
## 4 22.4 kgm at 1750-2750rpm     5
## 5    11.5@ 4,500(kgm@ rpm)     5
## 6        113.75nm@ 4000rpm     5
skim(car)
Data summary
Name car
Number of rows 8128
Number of columns 13
_______________________
Column type frequency:
factor 9
numeric 4
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
name 0 1 FALSE 2058 Mar: 129, Mar: 82, Mar: 71, BMW: 62
fuel 0 1 FALSE 4 Die: 4402, Pet: 3631, CNG: 57, LPG: 38
seller_type 0 1 FALSE 3 Ind: 6766, Dea: 1126, Tru: 236
transmission 0 1 FALSE 2 Man: 7078, Aut: 1050
owner 0 1 FALSE 5 Fir: 5289, Sec: 2105, Thi: 555, Fou: 174
mileage 0 1 FALSE 394 18.: 225, emp: 221, 19.: 173, 18.: 164
engine 0 1 FALSE 122 124: 1017, 119: 832, 998: 453, 796: 444
max_power 0 1 FALSE 323 74 : 377, 81.: 220, emp: 215, 88.: 204
torque 0 1 FALSE 442 190: 530, 200: 445, 90N: 405, 113: 223

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
year 0 1.00 2013.80 4.04 1983 2011 2015 2017 2020 ▁▁▁▃▇
selling_price 0 1.00 638271.81 806253.40 29999 254999 450000 675000 10000000 ▇▁▁▁▁
km_driven 0 1.00 69819.51 56550.55 1 35000 60000 98000 2360457 ▇▁▁▁▁
seats 221 0.97 5.42 0.96 2 5 5 5 14 ▁▇▂▁▁

Berikut adalah deskripsi dari peubah-peubah dalam dataset.

Variabel Deskripsi
name Nama mobil
year Tahun pembelian mobil
selling_price Harga jual mobil (Rp)
km_driven Jumlah kilometer yang ditempuh mobil (km)
fuel Jenis bahan bakar mobil (CNG/diesel/petrol/LPG)
seller_type Tipe penjual (individual/dealer/trustmark dealer)
transmission Transmisi mobil (automatic/manual)
owner Pemilik sebelumnya (first/second/third/fourth and above owner/test drive car)
mileage Jarak tempuh mobil (kmpl, km/kg)
engine Kapasitas mesin (CC)
max_power Kekuatan mesin (bhp)
torque Torsi mobil (kgm, nm)
seats Kapasitas tempat duduk

Berdasarkan output di atas, terdapat kondisi data yang harus ditangani terlebih dahulu sebelum melakukan pemodelan, antara lain:

  • Data cleansing dan konversi peubah numerik sehingga tersimpan sebagai numerik. Misalnya pada peubah engine harus dihilangkan unit CC dan konversi menjadi numerik. Demikian juga dengan peubah max_power, torque dan mileage.
  • Penanganan data hilang (missing value)

Data Preprocessing

  • Menghilangkan satuan pada peubah engine, max_power, torque dan mileage sekaligus mengubah menjadi numerik
  • Membuat peubah age, atau umur mobil sejak diproduksi (year) hingga dijual
  • Mengubah peubah owner menjadi ordinal/ordered factor
  • Mengeliminasi peubah name dan year
  • Menghilangkan mobil-mobil berbahan bakar CNG dan LPG.
car2 <- car %>% mutate(engine = as.numeric(str_remove(engine, " CC")),
               max_power = as.numeric(str_remove(max_power, " bhp")),
               torque = as.numeric(str_extract(torque, "[0-9.]+")),
               mileage = as.numeric(str_extract(mileage, "[0-9.]+")),
               owner = factor(owner, ordered = TRUE, 
                              levels = c("Test Drive Car", 
                                         "First Owner", 
                                         "Second Owner", 
                                         "Third Owner", 
                                         "Fourth & Above Owner")),
               age = 2022 - year) %>% 
  select(-name, -year) %>%
  filter(!fuel %in% c('CNG','LPG'))
skim(car2)
Data summary
Name car2
Number of rows 8033
Number of columns 12
_______________________
Column type frequency:
factor 4
numeric 8
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
fuel 0 1 FALSE 2 Die: 4402, Pet: 3631, CNG: 0, LPG: 0
seller_type 0 1 FALSE 3 Ind: 6673, Dea: 1124, Tru: 236
transmission 0 1 FALSE 2 Man: 6983, Aut: 1050
owner 0 1 TRUE 5 Fir: 5238, Sec: 2073, Thi: 547, Fou: 170

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
selling_price 0 1.00 642736.12 809863.53 29999.0 260000.00 450000.0 680000.00 10000000 ▇▁▁▁▁
km_driven 0 1.00 69738.82 56643.61 1000.0 35000.00 60000.0 98000.00 2360457 ▇▁▁▁▁
mileage 214 0.97 19.39 4.00 0.0 16.78 19.3 22.32 42 ▁▃▇▁▁
engine 214 0.97 1463.09 504.66 624.0 1197.00 1248.0 1582.00 3604 ▇▇▂▂▁
max_power 208 0.97 91.86 35.85 0.0 69.00 82.4 102.00 400 ▇▇▁▁▁
torque 214 0.97 169.32 97.32 4.8 104.00 160.0 204.00 789 ▇▆▂▁▁
seats 214 0.97 5.42 0.96 2.0 5.00 5.0 5.00 14 ▁▇▂▁▁
age 0 1.00 8.18 4.03 2.0 5.00 7.0 11.00 39 ▇▃▁▁▁

Splitting Data

Data dibagi menjadi dua kelompok, yaitu data train dan data testing

set.seed(123)
train_idx <- createDataPartition(car2$selling_price, p = 0.7, list=FALSE)
trainData <- car2[train_idx,]
testData <- car2[-train_idx,]

Eksplorasi Data

Eksplorasi data dilakukan terhadap data training.

Sebaran peubah response: selling_price

ggplot(trainData, aes_string(x = "selling_price")) + 
  geom_histogram(color = "black") +
  ggtitle("Sebaran selling_price")

Hubungan peubah respons dan peubah penjelas numerik

plot_numeric_features <- function(x){
  ggplot(trainData, aes_string(x, "selling_price")) +
    geom_point() +
    geom_smooth(method = "loess", se = F) +
    scale_x_continuous(labels = scales::comma) +
    ylim(0, NA)
}
plot_grid(
  plot_numeric_features("km_driven"),
  plot_numeric_features("mileage"),
  plot_numeric_features("engine"),
  plot_numeric_features("max_power"),
  plot_numeric_features("torque"),
  plot_numeric_features("seats"),
  plot_numeric_features("age"))

Dari plot di atas, terlihat adanya outlier pada peubah km_driven, torque, mileage, max_power

Distribusi data berdasarkan peubah penjelas kategorik

count_categoric_features <- function(x){
  ggplot(trainData, aes_string(x = x)) +
    geom_bar() + 
    coord_flip()
}
plot_grid(
  count_categoric_features("fuel"),
  count_categoric_features("seller_type"),
  count_categoric_features("transmission"),
  count_categoric_features("owner"))

Hubungan peubah respons dan peubah penjelas kategorik

plot_categoric_features <- function(x){
  ggplot(trainData, aes_string(x, "selling_price")) +
    geom_boxplot() +
    coord_flip()
}
plot_grid(
  plot_categoric_features("fuel"),
  plot_categoric_features("seller_type"),
  plot_categoric_features("transmission"),
  plot_categoric_features("owner"))

Outlier dan Missing Value

Penanganan Outlier/Pencilan

Dari hasil eksplorasi di atas, terlihat adanya pencilan/outlier pada peubah km_driven, torque, mileage, max_power. Salah satu penangan pencilan adalah dengan metode capping.

trainData$km_driven[trainData$km_driven > 500000] <- 500000
trainData$torque[trainData$torque > 640] <- 640
trainData$mileage[trainData$mileage > 30] <- 30
trainData$mileage[trainData$mileage < 7] <- 7
trainData$max_power[trainData$max_power > 300] <- 300

Setelah capping:

plot_grid(
  plot_numeric_features("km_driven"),
  plot_numeric_features("mileage"),
  plot_numeric_features("engine"),
  plot_numeric_features("max_power"),
  plot_numeric_features("torque"),
  plot_numeric_features("seats"),
  plot_numeric_features("age"))

Penanganan Missing Value

colSums(is.na(trainData))
## selling_price     km_driven          fuel   seller_type  transmission 
##             0             0             0             0             0 
##         owner       mileage        engine     max_power        torque 
##             0           153           153           151           153 
##         seats           age 
##           153             0

Penanganan missing value dilakukan dengan imputasi menggunakan nilai median.

imputer <- compute(trainData)
trainData <- impute(trainData, object=imputer)

Penanganan pada Data Testing

Penanganan outlier dan missing value pada data testing menggunakan cara yang sama dengan yang dilakukan terhadap data training.

testData$km_driven[testData$km_driven > 500000] <- 500000
testData$torque[testData$torque > 640] <- 640
testData$mileage[testData$mileage > 30] <- 30
testData$mileage[testData$mileage < 7] <- 7
testData$max_power[testData$max_power > 300] <- 300

testData <- impute(testData, object=imputer)

Modeling

Model prediksi menggunakan metode Random Forest dan Gradient Boosting. Pemilihan tuning parameter dilakukan menggunakan cross validation atau validasi silang terhadap data testing.

K-fold cross validation adalah salah satu teknik validasi untuk mencari tuning parameter terbaik sekaligus mengevaluasi kinerja model. Pada studi kasus ini digunakan 5-fold cross validation. Data dipartisi secara acak ke dalam lima subset data. Secara bergantian masing-masing subset akan dijadikan sebagai data testing, sementara empat subset data lainnya sebagai data training.

fitControl <- trainControl(
  method = "cv",
  number = 5,
  returnResamp = "all")

Modeling 1: Random Forest

Tuning Parameter dengan tuneLength

Opsi tuneLength pada fungsi caret::train akan memilih sejumlah tuning parameter atau kombinasi tuning parameter yang dianggap paling tepat sesuai dengan metode yang dipilih dan data training yang diberikan.

Cross Validation

rf <- train(selling_price ~ ., 
            data = trainData,
            method = 'ranger',
            tuneLength = 10, 
            importance = "impurity",
            trControl = fitControl,
            verbose = FALSE)
rf
## Random Forest 
## 
## 5625 samples
##   11 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 4499, 4500, 4501, 4499, 4501 
## Resampling results across tuning parameters:
## 
##   mtry  splitrule   RMSE      Rsquared   MAE      
##    2    variance    213772.6  0.9356596  106964.34
##    2    extratrees  301402.5  0.8936767  163106.11
##    3    variance    177517.4  0.9496120   81322.99
##    3    extratrees  213997.7  0.9327598  108454.70
##    4    variance    168063.7  0.9535745   74590.55
##    4    extratrees  187898.2  0.9434862   87635.78
##    6    variance    159362.0  0.9584761   71917.16
##    6    extratrees  172006.7  0.9511353   77144.31
##    7    variance    159791.6  0.9579907   71716.43
##    7    extratrees  169827.0  0.9524805   76259.05
##    9    variance    158158.5  0.9591719   71760.02
##    9    extratrees  166360.1  0.9544151   75135.83
##   10    variance    156764.9  0.9599440   71664.99
##   10    extratrees  164567.0  0.9554117   74670.42
##   12    variance    155688.6  0.9606910   71839.15
##   12    extratrees  161160.5  0.9574012   74235.90
##   13    variance    155365.6  0.9609113   71949.26
##   13    extratrees  160527.3  0.9577282   73899.11
##   15    variance    156109.4  0.9607269   72605.46
##   15    extratrees  158631.3  0.9586620   73612.57
## 
## Tuning parameter 'min.node.size' was held constant at a value of 5
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were mtry = 13, splitrule = variance
##  and min.node.size = 5.

Hasil validasi silang ditampilkan pada plot berikut:

plot(rf, main = "5-Fold Cross Validation Random Forest: tuneLength")

rf_best <- rf$bestTune
rf_best
##    mtry splitrule min.node.size
## 17   13  variance             5

Berdasarkan output di atas, tuning parameter terbaik adalah mtry = 13, splitrule = variance dan min.node.size = 5, yang memberikan RMSE = 155365.6, R-squared = 0.9609113 dan MAE = 71949.26.

Re-Fit Model Menggunakan Tuning Parameter Terbaik

Re-fit model terhadap seluruh data testing dengan menggunakan tuning parameter terbaik yang diperoleh pada tahap sebelumnya:

rf <- train(selling_price ~ ., 
            data = trainData,
            method = 'ranger',
            tuneGrid  = rf_best, 
            importance = "impurity",
            verbose = FALSE)
rf
## Random Forest 
## 
## 5625 samples
##   11 predictor
## 
## No pre-processing
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 5625, 5625, 5625, 5625, 5625, 5625, ... 
## Resampling results:
## 
##   RMSE      Rsquared   MAE     
##   170626.1  0.9546539  76479.84
## 
## Tuning parameter 'mtry' was held constant at a value of 13
## Tuning
##  parameter 'splitrule' was held constant at a value of variance
## 
## Tuning parameter 'min.node.size' was held constant at a value of 5
rf_result <- rf$results
rf_result
##   mtry splitrule min.node.size     RMSE  Rsquared      MAE   RMSESD RsquaredSD
## 1   13  variance             5 170626.1 0.9546539 76479.84 26090.26 0.01244749
##      MAESD
## 1 2988.774

Diperoleh model dengan RMSE = 170626.1, R-Squared = 0.9546539, MAE = 76479.84.

Evaluasi Terhadap Data Test

Untuk menguji kinerja model dalam memprediksi data baru, dilakukan evaluasi terhadap data testing:

eval_test_data <- function(model){
  pred <- predict(model, newdata = testData)
  mae <- MAE(testData$selling_price, pred)
  rmse <- RMSE(testData$selling_price, pred)
  R2 <- R2_Score(testData$selling_price, pred)
  return(c(RMSE = rmse, R_Squared = R2, MAE = mae))
}
rf_eval <- eval_test_data(rf)
rf_eval
##           RMSE      R_Squared            MAE 
## 133429.7287178      0.9736961  68359.0643768

Diperoleh model RMSE = 133429.7287178, R-Squared = 0.9744161, MAE = 68359.0643768.

Variable Importance

plot(varImp(rf), 
     main = "Random Forest Variable Importance" )

Berdasarkan output di atas, tiga peubah terpenting adalah max_power, year, dan torque.

Tuning Parameter dengan tuneGrid

Opsi tuneGrid pada fungsi caret::train memberikan keleluasaan kepada analis untuk menentukan kandidat tuning parameter atau kombinasi tuning parameter yang akan diuji.

Cross Validation

tg <- expand.grid(
  mtry = seq(2, 14, 2),
  splitrule = c("variance","extratrees"),
  min.node.size = c(5, 10, 20, 30))

rf_tg <- train(selling_price ~ ., 
            data = trainData,
            method = 'ranger',
            tuneGrid = tg,
            ntree = 500,
            max_deep = 100,
            importance = "impurity",
            trControl = fitControl,
            verbose = FALSE)
rf_tg
## Random Forest 
## 
## 5625 samples
##   11 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 4499, 4500, 4501, 4499, 4501 
## Resampling results across tuning parameters:
## 
##   mtry  splitrule   min.node.size  RMSE      Rsquared   MAE      
##    2    variance     5             213719.5  0.9356199  107232.78
##    2    variance    10             215960.2  0.9344849  108196.91
##    2    variance    20             223080.8  0.9295734  110810.77
##    2    variance    30             229674.2  0.9269485  113497.30
##    2    extratrees   5             299006.3  0.8946027  161926.59
##    2    extratrees  10             305095.4  0.8895127  163633.00
##    2    extratrees  20             310988.7  0.8857602  167159.56
##    2    extratrees  30             315782.3  0.8824282  168331.49
##    4    variance     5             166470.6  0.9545531   74390.79
##    4    variance    10             173090.9  0.9511645   76855.73
##    4    variance    20             178524.5  0.9486429   80093.61
##    4    variance    30             187528.6  0.9438466   84089.21
##    4    extratrees   5             186237.3  0.9447330   87496.62
##    4    extratrees  10             195552.1  0.9393176   91107.88
##    4    extratrees  20             207597.5  0.9328367   98206.24
##    4    extratrees  30             219758.8  0.9260411  105351.92
##    6    variance     5             161528.4  0.9570422   72150.55
##    6    variance    10             164558.6  0.9556767   73499.77
##    6    variance    20             171468.5  0.9523911   76800.45
##    6    variance    30             178349.4  0.9488870   80454.62
##    6    extratrees   5             170992.8  0.9521615   77301.40
##    6    extratrees  10             180934.9  0.9466800   81128.21
##    6    extratrees  20             193408.3  0.9402307   87610.05
##    6    extratrees  30             202218.7  0.9357428   93880.73
##    8    variance     5             157412.5  0.9594540   71467.72
##    8    variance    10             160725.8  0.9578757   73043.96
##    8    variance    20             167850.4  0.9542800   76003.49
##    8    variance    30             175147.4  0.9505631   79480.88
##    8    extratrees   5             167060.5  0.9538281   75441.74
##    8    extratrees  10             173436.1  0.9510390   78678.07
##    8    extratrees  20             186242.0  0.9441982   84856.69
##    8    extratrees  30             194723.3  0.9399071   90374.87
##   10    variance     5             156068.6  0.9604042   71599.04
##   10    variance    10             159993.7  0.9583167   73025.63
##   10    variance    20             164791.5  0.9560985   75740.10
##   10    variance    30             173805.7  0.9512770   79360.41
##   10    extratrees   5             164677.5  0.9553732   74815.43
##   10    extratrees  10             169244.7  0.9534254   77270.59
##   10    extratrees  20             180758.9  0.9473169   82783.45
##   10    extratrees  30             190599.5  0.9420836   88229.90
##   12    variance     5             156221.5  0.9605196   71954.99
##   12    variance    10             158122.5  0.9594349   72864.77
##   12    variance    20             163490.0  0.9569222   75584.63
##   12    variance    30             170663.2  0.9531860   79012.69
##   12    extratrees   5             162710.6  0.9565309   74273.74
##   12    extratrees  10             168136.8  0.9538781   76999.62
##   12    extratrees  20             177280.2  0.9493637   81662.42
##   12    extratrees  30             186358.9  0.9446752   86771.80
##   14    variance     5             155720.8  0.9608428   72146.51
##   14    variance    10             157891.0  0.9596549   73187.40
##   14    variance    20             164925.0  0.9560060   76271.57
##   14    variance    30             170144.9  0.9534414   79155.60
##   14    extratrees   5             159110.0  0.9585606   73729.74
##   14    extratrees  10             165865.7  0.9551071   76170.85
##   14    extratrees  20             174566.3  0.9509678   81069.76
##   14    extratrees  30             182733.4  0.9465300   85494.98
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were mtry = 14, splitrule = variance
##  and min.node.size = 5.
plot(rf_tg, main = "5-Fold Cross Validation Random Forest: tuneGrid")

rf_tg_best <- rf_tg$bestTune
rf_tg_best
##    mtry splitrule min.node.size
## 49   14  variance             5

Berdasarkan output di atas, tuning parameter terbaik adalah mtry = 14, splitrule = variance dan min.node.size = 5, yang memberikan CV RMSE = 155720.8, R-squared = 0.9608428, dan MAE = 72146.51.

Re-Fit Model Menggunakan Tuning Parameter Terbaik

Re-fit model terhadap seluruh data testing dengan menggunakan tuning parameter terbaik yang diperoleh pada tahap sebelumnya:

rf_tg <- train(selling_price ~ ., 
            data = trainData,
            method = 'ranger',
            tuneGrid  = rf_tg_best, 
            ntree = 500,
            max_deep = 100,
            importance = "impurity",
            verbose = FALSE)
rf_tg
## Random Forest 
## 
## 5625 samples
##   11 predictor
## 
## No pre-processing
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 5625, 5625, 5625, 5625, 5625, 5625, ... 
## Resampling results:
## 
##   RMSE      Rsquared   MAE  
##   170450.7  0.9553525  76110
## 
## Tuning parameter 'mtry' was held constant at a value of 14
## Tuning
##  parameter 'splitrule' was held constant at a value of variance
## 
## Tuning parameter 'min.node.size' was held constant at a value of 5
rf_tg_result <- rf_tg$results
rf_tg_result
##   mtry splitrule min.node.size     RMSE  Rsquared   MAE   RMSESD  RsquaredSD
## 1   14  variance             5 170450.7 0.9553525 76110 22332.26 0.009553552
##      MAESD
## 1 2886.202

Diperoleh model dengan RMSE = 170450.7, R-Squared = 0.9553525 , MAE= 76110.

Evaluasi Terhadap Data Test

Untuk menguji kinerja model dalam memprediksi data baru, dilakukan evaluasi terhadap data testing:

rf_tg_eval <- eval_test_data(rf_tg)
rf_tg_eval
##           RMSE      R_Squared            MAE 
## 132655.3068076      0.9740354  68812.6004312

Diperoleh model RMSE = 133429.7287178, R-Squared = 0.9744161, MAE = 68359.0643768.

Variable Importance

plot(varImp(rf_tg), 
     main = "Random Forest Variable Importance")

Berdasarkan output di atas, tiga peubah terpenting adalah max_power, year, dan torque.

Modeling 2: Gradient Boosting

Seperti Random Forest, pemilihan tuning parameter pada Gradient Boosting juga menggunakan 5-fold Cross-Validation.

Tuning Parameter dengan: tuneLength

Cross Validation

boost <- train(selling_price ~., 
               data=trainData, 
               method="gbm",
               tuneLength = 10,  
               trControl=fitControl,
               verbose = FALSE)
boost
## Stochastic Gradient Boosting 
## 
## 5625 samples
##   11 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 4501, 4500, 4501, 4499, 4499 
## Resampling results across tuning parameters:
## 
##   interaction.depth  n.trees  RMSE      Rsquared   MAE      
##    1                  50      353932.2  0.8088870  180987.01
##    1                 100      329482.2  0.8284958  164065.14
##    1                 150      316030.7  0.8399940  160046.83
##    1                 200      308852.5  0.8463823  157628.64
##    1                 250      304041.4  0.8500792  156027.24
##    1                 300      299697.0  0.8541960  154775.10
##    1                 350      295475.4  0.8581671  153001.27
##    1                 400      292552.1  0.8605509  151942.19
##    1                 450      289692.1  0.8631559  150771.79
##    1                 500      288342.8  0.8643507  149881.47
##    2                  50      244515.2  0.9051000  134075.48
##    2                 100      220029.2  0.9209970  118403.50
##    2                 150      207982.9  0.9290295  112980.86
##    2                 200      201158.7  0.9335218  109185.90
##    2                 250      195665.9  0.9369375  106344.55
##    2                 300      191886.1  0.9392940  104155.89
##    2                 350      187966.0  0.9414978  101933.82
##    2                 400      184983.1  0.9431929  100309.54
##    2                 450      182133.8  0.9448794   98946.93
##    2                 500      179972.5  0.9460673   97575.94
##    3                  50      221857.7  0.9211802  121911.09
##    3                 100      196155.9  0.9373879  106161.46
##    3                 150      185123.9  0.9438828  100757.34
##    3                 200      177813.9  0.9479415   96722.24
##    3                 250      174656.9  0.9497040   94806.93
##    3                 300      171239.1  0.9517271   93060.55
##    3                 350      168361.7  0.9532706   91451.29
##    3                 400      167176.9  0.9538762   90289.30
##    3                 450      165442.1  0.9547364   89253.55
##    3                 500      163677.4  0.9556731   88256.67
##    4                  50      213102.5  0.9272724  114527.97
##    4                 100      189580.4  0.9417015  100827.33
##    4                 150      177537.1  0.9484783   94911.81
##    4                 200      172159.9  0.9513173   92277.06
##    4                 250      167387.8  0.9539768   89488.98
##    4                 300      164592.3  0.9554236   87927.09
##    4                 350      162285.3  0.9566315   86314.24
##    4                 400      159978.2  0.9578424   85088.15
##    4                 450      159300.1  0.9581587   84016.93
##    4                 500      158143.3  0.9586715   83476.01
##    5                  50      199121.6  0.9357471  107133.03
##    5                 100      178598.8  0.9477551   94897.47
##    5                 150      170473.9  0.9523910   90281.84
##    5                 200      166163.3  0.9547029   87702.39
##    5                 250      162602.6  0.9566704   85864.19
##    5                 300      160446.3  0.9576842   84377.75
##    5                 350      159892.8  0.9579925   83352.32
##    5                 400      158422.5  0.9587457   82171.25
##    5                 450      157407.3  0.9592714   81539.79
##    5                 500      156727.2  0.9596837   80693.68
##    6                  50      196874.3  0.9371823  102788.01
##    6                 100      178323.5  0.9478448   91978.72
##    6                 150      169946.2  0.9523706   87703.82
##    6                 200      165349.8  0.9546547   84878.56
##    6                 250      162348.0  0.9562791   83271.47
##    6                 300      160029.1  0.9574737   81576.04
##    6                 350      158276.2  0.9583403   80463.04
##    6                 400      157229.3  0.9587769   79385.13
##    6                 450      156290.2  0.9591502   78469.42
##    6                 500      156310.3  0.9591123   77868.29
##    7                  50      189686.1  0.9421227   99658.00
##    7                 100      171121.2  0.9524527   88927.58
##    7                 150      164381.8  0.9559448   84883.15
##    7                 200      161249.2  0.9574154   82828.19
##    7                 250      158054.5  0.9590605   80969.32
##    7                 300      156822.0  0.9597188   79918.89
##    7                 350      155994.2  0.9601805   78942.46
##    7                 400      154927.1  0.9607059   78010.92
##    7                 450      153961.5  0.9611715   77105.21
##    7                 500      153266.2  0.9614794   76376.74
##    8                  50      182928.6  0.9461617   96765.52
##    8                 100      165817.3  0.9553158   86961.70
##    8                 150      158092.0  0.9591537   83507.16
##    8                 200      154750.3  0.9608479   81048.85
##    8                 250      152610.4  0.9616633   79663.55
##    8                 300      151843.2  0.9619468   78718.21
##    8                 350      150986.7  0.9623560   77687.65
##    8                 400      150000.6  0.9628909   76675.68
##    8                 450      149616.6  0.9630338   75847.06
##    8                 500      148799.1  0.9633404   75014.08
##    9                  50      183873.4  0.9452539   95344.70
##    9                 100      167567.5  0.9540996   86370.55
##    9                 150      160989.8  0.9575176   82848.90
##    9                 200      156784.6  0.9594068   80725.93
##    9                 250      154552.9  0.9605431   78976.28
##    9                 300      152436.6  0.9614612   77214.56
##    9                 350      151316.0  0.9619338   76100.09
##    9                 400      150231.2  0.9624050   75319.97
##    9                 450      150011.7  0.9624641   74756.73
##    9                 500      149437.6  0.9627168   74113.90
##   10                  50      180333.6  0.9471727   92533.35
##   10                 100      165773.0  0.9546764   84845.34
##   10                 150      159623.1  0.9577795   81365.00
##   10                 200      157211.0  0.9589412   79459.29
##   10                 250      155292.4  0.9599751   77741.65
##   10                 300      153952.1  0.9606606   76709.36
##   10                 350      152925.4  0.9611305   75556.22
##   10                 400      152720.0  0.9612442   74755.98
##   10                 450      152164.8  0.9614987   74063.25
##   10                 500      151937.0  0.9615109   73392.27
## 
## Tuning parameter 'shrinkage' was held constant at a value of 0.1
## 
## Tuning parameter 'n.minobsinnode' was held constant at a value of 10
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were n.trees = 500, interaction.depth =
##  8, shrinkage = 0.1 and n.minobsinnode = 10.

Hasil validasi silang ditampilkan pada plot berikut:

plot(boost, main = "5-Fold Cross Validation Gradient Boosting: tuneLength")

boost_best <- boost$bestTune
boost_best
##    n.trees interaction.depth shrinkage n.minobsinnode
## 80     500                 8       0.1             10

Berdasarkan output di atas, tuning parameter terbaik adalah n.trees = 500, interaction.depth = 8 dan shrinkage = 0.1 dan n.minobsinnode = 10, yang memberikan CV RMSE = 148799.1, R-squared = 0.9633404, dan MAE = 75014.08.

Re-Fit Model Menggunakan Tuning Parameter Terbaik

boost <- train(selling_price ~., 
               data=trainData, 
               method="gbm",
               tuneGrid  = boost_best,
               verbose = FALSE)
boost
## Stochastic Gradient Boosting 
## 
## 5625 samples
##   11 predictor
## 
## No pre-processing
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 5625, 5625, 5625, 5625, 5625, 5625, ... 
## Resampling results:
## 
##   RMSE      Rsquared   MAE     
##   170019.4  0.9538613  77077.11
## 
## Tuning parameter 'n.trees' was held constant at a value of 500
## Tuning
## 
## Tuning parameter 'shrinkage' was held constant at a value of 0.1
## 
## Tuning parameter 'n.minobsinnode' was held constant at a value of 10
boost_result <- boost$results
boost_result
##   n.trees interaction.depth shrinkage n.minobsinnode     RMSE  Rsquared
## 1     500                 8       0.1             10 170019.4 0.9538613
##        MAE   RMSESD RsquaredSD    MAESD
## 1 77077.11 26278.19 0.01295492 2382.593

Diperoleh model dengan RMSE = 170019.4, R-Squared = 0.9538613 , MAE = 77077.11.

Evaluasi Terhadap Data Test

boost_eval <- eval_test_data(boost)
boost_eval
##           RMSE      R_Squared            MAE 
## 128386.9747334      0.9760126  73114.2917306

Diperoleh model dengan RMSE = 128386.9747334, R-Squared = 0.9760126 , MAE = 73114.2917306.

Variable Importance

plot(varImp(boost),
     main = "Gradient Boosting Variable Importance" )

Berdasarkan output di atas, tiga peubah terpenting adalah max_power, year, dan torque.

Tuning Parameter dengan: tuneGrid

Cross Validation

tg <- expand.grid(shrinkage = seq(0.1, 0.3, by = 0.1), 
                  interaction.depth = 5:10,
                  n.minobsinnode = seq(4, 10, 2),
                  n.trees = c(50, 100, 300, 500))
                  
boost_tg <- train(selling_price ~., 
               data=trainData, 
               method="gbm",
               tuneGrid = tg,  
               trControl=fitControl,
               verbose = FALSE)
boost_tg
## Stochastic Gradient Boosting 
## 
## 5625 samples
##   11 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 4500, 4500, 4500, 4501, 4499 
## Resampling results across tuning parameters:
## 
##   shrinkage  interaction.depth  n.minobsinnode  n.trees  RMSE      Rsquared 
##   0.1         5                  4               50      196307.3  0.9404769
##   0.1         5                  4              100      176661.2  0.9507913
##   0.1         5                  4              300      162764.7  0.9576155
##   0.1         5                  4              500      158353.2  0.9597125
##   0.1         5                  6               50      200329.2  0.9374785
##   0.1         5                  6              100      182327.0  0.9471212
##   0.1         5                  6              300      167069.4  0.9549888
##   0.1         5                  6              500      163245.1  0.9569069
##   0.1         5                  8               50      204839.7  0.9348194
##   0.1         5                  8              100      183989.2  0.9464388
##   0.1         5                  8              300      165079.7  0.9562683
##   0.1         5                  8              500      160417.9  0.9586685
##   0.1         5                 10               50      205419.5  0.9344476
##   0.1         5                 10              100      183380.1  0.9469141
##   0.1         5                 10              300      166052.4  0.9558622
##   0.1         5                 10              500      160663.0  0.9587073
##   0.1         6                  4               50      196368.1  0.9397687
##   0.1         6                  4              100      179985.2  0.9482586
##   0.1         6                  4              300      165579.0  0.9553590
##   0.1         6                  4              500      161730.2  0.9573316
##   0.1         6                  6               50      195157.5  0.9403826
##   0.1         6                  6              100      180439.0  0.9481737
##   0.1         6                  6              300      167021.3  0.9547103
##   0.1         6                  6              500      163781.8  0.9562549
##   0.1         6                  8               50      191268.5  0.9428233
##   0.1         6                  8              100      176398.5  0.9504706
##   0.1         6                  8              300      162489.3  0.9575390
##   0.1         6                  8              500      159494.7  0.9590117
##   0.1         6                 10               50      194693.2  0.9402031
##   0.1         6                 10              100      177147.6  0.9497220
##   0.1         6                 10              300      162695.2  0.9571514
##   0.1         6                 10              500      160411.4  0.9583205
##   0.1         7                  4               50      185793.3  0.9460370
##   0.1         7                  4              100      171762.2  0.9529830
##   0.1         7                  4              300      158762.5  0.9592357
##   0.1         7                  4              500      155784.6  0.9604090
##   0.1         7                  6               50      188246.4  0.9447370
##   0.1         7                  6              100      172445.9  0.9526295
##   0.1         7                  6              300      160672.0  0.9583460
##   0.1         7                  6              500      158422.9  0.9592712
##   0.1         7                  8               50      190514.3  0.9432996
##   0.1         7                  8              100      173530.8  0.9521849
##   0.1         7                  8              300      161633.2  0.9578873
##   0.1         7                  8              500      159933.2  0.9585530
##   0.1         7                 10               50      189169.2  0.9441540
##   0.1         7                 10              100      174052.6  0.9519094
##   0.1         7                 10              300      162931.0  0.9574614
##   0.1         7                 10              500      160012.8  0.9589405
##   0.1         8                  4               50      181644.5  0.9481892
##   0.1         8                  4              100      168161.0  0.9548073
##   0.1         8                  4              300      159759.5  0.9587787
##   0.1         8                  4              500      157890.0  0.9596859
##   0.1         8                  6               50      184298.5  0.9467360
##   0.1         8                  6              100      169690.9  0.9541448
##   0.1         8                  6              300      159652.7  0.9588204
##   0.1         8                  6              500      156139.9  0.9605649
##   0.1         8                  8               50      189958.8  0.9430985
##   0.1         8                  8              100      175180.3  0.9506478
##   0.1         8                  8              300      162479.0  0.9569329
##   0.1         8                  8              500      158420.3  0.9589827
##   0.1         8                 10               50      184881.7  0.9464935
##   0.1         8                 10              100      168057.8  0.9550300
##   0.1         8                 10              300      155827.7  0.9611751
##   0.1         8                 10              500      153832.1  0.9620934
##   0.1         9                  4               50      176005.8  0.9513208
##   0.1         9                  4              100      164454.5  0.9567348
##   0.1         9                  4              300      156141.6  0.9604649
##   0.1         9                  4              500      155178.0  0.9608480
##   0.1         9                  6               50      182564.6  0.9476639
##   0.1         9                  6              100      168310.7  0.9545361
##   0.1         9                  6              300      158143.5  0.9593619
##   0.1         9                  6              500      156408.0  0.9601486
##   0.1         9                  8               50      181187.3  0.9486467
##   0.1         9                  8              100      169266.4  0.9543642
##   0.1         9                  8              300      158829.2  0.9595024
##   0.1         9                  8              500      156838.6  0.9604024
##   0.1         9                 10               50      183340.2  0.9471005
##   0.1         9                 10              100      167867.2  0.9548862
##   0.1         9                 10              300      156243.3  0.9605098
##   0.1         9                 10              500      154433.2  0.9613476
##   0.1        10                  4               50      174049.4  0.9522637
##   0.1        10                  4              100      163406.3  0.9568864
##   0.1        10                  4              300      156161.1  0.9602418
##   0.1        10                  4              500      155307.0  0.9605497
##   0.1        10                  6               50      176709.5  0.9508276
##   0.1        10                  6              100      164470.7  0.9567800
##   0.1        10                  6              300      155742.6  0.9607143
##   0.1        10                  6              500      155036.0  0.9609315
##   0.1        10                  8               50      178779.2  0.9495751
##   0.1        10                  8              100      165783.1  0.9560195
##   0.1        10                  8              300      156562.3  0.9603429
##   0.1        10                  8              500      153924.6  0.9615992
##   0.1        10                 10               50      178929.5  0.9496403
##   0.1        10                 10              100      167767.2  0.9550219
##   0.1        10                 10              300      158134.8  0.9596269
##   0.1        10                 10              500      156281.1  0.9604551
##   0.2         5                  4               50      188099.0  0.9442729
##   0.2         5                  4              100      176558.4  0.9503130
##   0.2         5                  4              300      168331.8  0.9543105
##   0.2         5                  4              500      167094.4  0.9550025
##   0.2         5                  6               50      187495.9  0.9436637
##   0.2         5                  6              100      175793.9  0.9499660
##   0.2         5                  6              300      167376.5  0.9544136
##   0.2         5                  6              500      166083.0  0.9549586
##   0.2         5                  8               50      181916.6  0.9475893
##   0.2         5                  8              100      168446.4  0.9547219
##   0.2         5                  8              300      156978.0  0.9606042
##   0.2         5                  8              500      154343.7  0.9617844
##   0.2         5                 10               50      187787.7  0.9439717
##   0.2         5                 10              100      171888.2  0.9526295
##   0.2         5                 10              300      161781.2  0.9579573
##   0.2         5                 10              500      160038.7  0.9586537
##   0.2         6                  4               50      177762.7  0.9499546
##   0.2         6                  4              100      173259.4  0.9521443
##   0.2         6                  4              300      165939.9  0.9557959
##   0.2         6                  4              500      166110.1  0.9555813
##   0.2         6                  6               50      188461.2  0.9435009
##   0.2         6                  6              100      178340.6  0.9488366
##   0.2         6                  6              300      166441.9  0.9549667
##   0.2         6                  6              500      166109.1  0.9548908
##   0.2         6                  8               50      177273.4  0.9500532
##   0.2         6                  8              100      165137.0  0.9563295
##   0.2         6                  8              300      158422.2  0.9595784
##   0.2         6                  8              500      157580.6  0.9598239
##   0.2         6                 10               50      176971.4  0.9498840
##   0.2         6                 10              100      166435.3  0.9554482
##   0.2         6                 10              300      157058.6  0.9600547
##   0.2         6                 10              500      155006.1  0.9610268
##   0.2         7                  4               50      175135.6  0.9511733
##   0.2         7                  4              100      169523.8  0.9540443
##   0.2         7                  4              300      162810.1  0.9572828
##   0.2         7                  4              500      161116.1  0.9580107
##   0.2         7                  6               50      179700.8  0.9484685
##   0.2         7                  6              100      170462.8  0.9533379
##   0.2         7                  6              300      165604.4  0.9551890
##   0.2         7                  6              500      164585.1  0.9556326
##   0.2         7                  8               50      174176.4  0.9512727
##   0.2         7                  8              100      166320.8  0.9553275
##   0.2         7                  8              300      159416.9  0.9586951
##   0.2         7                  8              500      156786.3  0.9600180
##   0.2         7                 10               50      182991.9  0.9469611
##   0.2         7                 10              100      171489.9  0.9529910
##   0.2         7                 10              300      162506.1  0.9574660
##   0.2         7                 10              500      160671.6  0.9583819
##   0.2         8                  4               50      173098.9  0.9524981
##   0.2         8                  4              100      165808.3  0.9562156
##   0.2         8                  4              300      160607.9  0.9586605
##   0.2         8                  4              500      160453.5  0.9587018
##   0.2         8                  6               50      169951.4  0.9542789
##   0.2         8                  6              100      164810.2  0.9564917
##   0.2         8                  6              300      160714.9  0.9582608
##   0.2         8                  6              500      160618.9  0.9582917
##   0.2         8                  8               50      169781.1  0.9538167
##   0.2         8                  8              100      161511.3  0.9580041
##   0.2         8                  8              300      156034.1  0.9606680
##   0.2         8                  8              500      154875.8  0.9611121
##   0.2         8                 10               50      171485.9  0.9533687
##   0.2         8                 10              100      163446.2  0.9571502
##   0.2         8                 10              300      157364.3  0.9600219
##   0.2         8                 10              500      156470.7  0.9604478
##   0.2         9                  4               50      170457.7  0.9527794
##   0.2         9                  4              100      163918.7  0.9560411
##   0.2         9                  4              300      162252.0  0.9567823
##   0.2         9                  4              500      161407.9  0.9572816
##   0.2         9                  6               50      170978.8  0.9534450
##   0.2         9                  6              100      162679.9  0.9575363
##   0.2         9                  6              300      157040.4  0.9599326
##   0.2         9                  6              500      156643.3  0.9600685
##   0.2         9                  8               50      170136.5  0.9541742
##   0.2         9                  8              100      161846.3  0.9584849
##   0.2         9                  8              300      156031.2  0.9614348
##   0.2         9                  8              500      156174.4  0.9613745
##   0.2         9                 10               50      175829.4  0.9507858
##   0.2         9                 10              100      169790.3  0.9536737
##   0.2         9                 10              300      163936.8  0.9564670
##   0.2         9                 10              500      163736.3  0.9565114
##   0.2        10                  4               50      170388.9  0.9533860
##   0.2        10                  4              100      163280.2  0.9569056
##   0.2        10                  4              300      161315.3  0.9575450
##   0.2        10                  4              500      161191.6  0.9575613
##   0.2        10                  6               50      166759.3  0.9559737
##   0.2        10                  6              100      162300.9  0.9578643
##   0.2        10                  6              300      158801.6  0.9592248
##   0.2        10                  6              500      158436.9  0.9593905
##   0.2        10                  8               50      169194.0  0.9543868
##   0.2        10                  8              100      164262.8  0.9568156
##   0.2        10                  8              300      161433.5  0.9582042
##   0.2        10                  8              500      160691.7  0.9585000
##   0.2        10                 10               50      166220.3  0.9560718
##   0.2        10                 10              100      160667.6  0.9587650
##   0.2        10                 10              300      155695.4  0.9611198
##   0.2        10                 10              500      154986.2  0.9613809
##   0.3         5                  4               50      184346.9  0.9460275
##   0.3         5                  4              100      174814.6  0.9510703
##   0.3         5                  4              300      170675.9  0.9530605
##   0.3         5                  4              500      169468.5  0.9536073
##   0.3         5                  6               50      184188.8  0.9456246
##   0.3         5                  6              100      177759.4  0.9490570
##   0.3         5                  6              300      168740.0  0.9538290
##   0.3         5                  6              500      167276.9  0.9543407
##   0.3         5                  8               50      186159.7  0.9447834
##   0.3         5                  8              100      177443.9  0.9498493
##   0.3         5                  8              300      171013.3  0.9532626
##   0.3         5                  8              500      169794.4  0.9538071
##   0.3         5                 10               50      178128.3  0.9493592
##   0.3         5                 10              100      170731.6  0.9533191
##   0.3         5                 10              300      162826.7  0.9573212
##   0.3         5                 10              500      160244.1  0.9587018
##   0.3         6                  4               50      176067.5  0.9507431
##   0.3         6                  4              100      165264.0  0.9564027
##   0.3         6                  4              300      161399.8  0.9580272
##   0.3         6                  4              500      159719.0  0.9588244
##   0.3         6                  6               50      181578.3  0.9474665
##   0.3         6                  6              100      175740.5  0.9503897
##   0.3         6                  6              300      168636.7  0.9537461
##   0.3         6                  6              500      169403.2  0.9532151
##   0.3         6                  8               50      178603.6  0.9488548
##   0.3         6                  8              100      170268.1  0.9533565
##   0.3         6                  8              300      165788.0  0.9560229
##   0.3         6                  8              500      163697.2  0.9571213
##   0.3         6                 10               50      179390.2  0.9483946
##   0.3         6                 10              100      173451.3  0.9517958
##   0.3         6                 10              300      167391.6  0.9545357
##   0.3         6                 10              500      165535.6  0.9554454
##   0.3         7                  4               50      176376.6  0.9496554
##   0.3         7                  4              100      171904.5  0.9519423
##   0.3         7                  4              300      167132.2  0.9542288
##   0.3         7                  4              500      167813.3  0.9540202
##   0.3         7                  6               50      178445.4  0.9492020
##   0.3         7                  6              100      171258.7  0.9529095
##   0.3         7                  6              300      167668.3  0.9548348
##   0.3         7                  6              500      168144.2  0.9545396
##   0.3         7                  8               50      167118.1  0.9550597
##   0.3         7                  8              100      165787.3  0.9553862
##   0.3         7                  8              300      161840.1  0.9574336
##   0.3         7                  8              500      162561.8  0.9569184
##   0.3         7                 10               50      176261.0  0.9501417
##   0.3         7                 10              100      170047.2  0.9533623
##   0.3         7                 10              300      165573.3  0.9556956
##   0.3         7                 10              500      165016.4  0.9559992
##   0.3         8                  4               50      170652.2  0.9533753
##   0.3         8                  4              100      167675.2  0.9547935
##   0.3         8                  4              300      166768.6  0.9549664
##   0.3         8                  4              500      167258.9  0.9545956
##   0.3         8                  6               50      167939.4  0.9549261
##   0.3         8                  6              100      162222.8  0.9576935
##   0.3         8                  6              300      160300.3  0.9583898
##   0.3         8                  6              500      161253.6  0.9579960
##   0.3         8                  8               50      174350.7  0.9513345
##   0.3         8                  8              100      167575.2  0.9548729
##   0.3         8                  8              300      163856.5  0.9567040
##   0.3         8                  8              500      164038.2  0.9566456
##   0.3         8                 10               50      173883.8  0.9520197
##   0.3         8                 10              100      167792.8  0.9550952
##   0.3         8                 10              300      163863.1  0.9569850
##   0.3         8                 10              500      162641.1  0.9575124
##   0.3         9                  4               50      165702.5  0.9563560
##   0.3         9                  4              100      164232.1  0.9568577
##   0.3         9                  4              300      163497.0  0.9569667
##   0.3         9                  4              500      163895.3  0.9568420
##   0.3         9                  6               50      177818.8  0.9492300
##   0.3         9                  6              100      170533.3  0.9531712
##   0.3         9                  6              300      168151.8  0.9544569
##   0.3         9                  6              500      167288.3  0.9549869
##   0.3         9                  8               50      167602.4  0.9555603
##   0.3         9                  8              100      159024.8  0.9598804
##   0.3         9                  8              300      156055.9  0.9610038
##   0.3         9                  8              500      156214.3  0.9608937
##   0.3         9                 10               50      171855.1  0.9528562
##   0.3         9                 10              100      165331.7  0.9562324
##   0.3         9                 10              300      163149.3  0.9573117
##   0.3         9                 10              500      163012.1  0.9573940
##   0.3        10                  4               50      173883.9  0.9510384
##   0.3        10                  4              100      171419.1  0.9521556
##   0.3        10                  4              300      170171.6  0.9525600
##   0.3        10                  4              500      170326.6  0.9524267
##   0.3        10                  6               50      172096.3  0.9528586
##   0.3        10                  6              100      169347.4  0.9539823
##   0.3        10                  6              300      166497.3  0.9551988
##   0.3        10                  6              500      166760.2  0.9551648
##   0.3        10                  8               50      169761.8  0.9539478
##   0.3        10                  8              100      166351.5  0.9555750
##   0.3        10                  8              300      165716.4  0.9559250
##   0.3        10                  8              500      165166.4  0.9563427
##   0.3        10                 10               50      168429.4  0.9542629
##   0.3        10                 10              100      165522.8  0.9557918
##   0.3        10                 10              300      163252.4  0.9572072
##   0.3        10                 10              500      163225.8  0.9570931
##   MAE      
##   108547.51
##    95857.58
##    83692.06
##    78705.96
##   108758.64
##    96337.82
##    85077.68
##    80421.65
##   108928.67
##    96655.99
##    84863.88
##    80553.55
##   109513.26
##    96907.33
##    85705.52
##    81644.60
##   105783.26
##    93697.47
##    81652.01
##    77432.22
##   104071.57
##    93017.65
##    81754.92
##    77515.52
##   103069.73
##    92076.91
##    82515.24
##    79041.36
##   102988.06
##    92378.28
##    82664.06
##    79165.12
##   101176.34
##    90217.23
##    79440.98
##    75091.31
##    99616.30
##    89114.75
##    79550.90
##    76189.47
##   101210.74
##    90320.09
##    80952.74
##    77391.52
##    99715.60
##    90258.38
##    81940.36
##    78576.47
##    97787.00
##    87529.82
##    77954.00
##    74342.26
##    97772.64
##    88166.33
##    79199.99
##    75452.75
##    97545.00
##    87836.64
##    78805.50
##    75115.17
##    97552.82
##    87865.43
##    79803.39
##    76683.23
##    94418.40
##    85716.17
##    76451.92
##    73796.50
##    96032.21
##    86691.33
##    77822.94
##    74553.12
##    94805.36
##    86521.62
##    78279.85
##    75491.28
##    95261.05
##    86851.24
##    78211.54
##    74975.55
##    93600.98
##    84782.92
##    76197.85
##    73290.20
##    93491.92
##    85466.96
##    76668.40
##    74024.17
##    93562.78
##    85741.43
##    77549.35
##    74604.68
##    92300.34
##    85236.39
##    77571.65
##    74930.04
##    99369.87
##    90003.76
##    79862.92
##    76264.76
##    98994.40
##    90267.03
##    80088.11
##    76181.95
##    97382.74
##    90258.35
##    80225.01
##    76598.37
##    97683.15
##    90083.78
##    81558.15
##    78146.12
##    94121.50
##    87003.37
##    78120.19
##    75932.27
##    95851.32
##    88013.39
##    78031.20
##    75188.42
##    93622.03
##    86785.33
##    79016.96
##    75863.52
##    92989.97
##    86882.95
##    79043.81
##    75693.24
##    91511.66
##    84995.36
##    76667.00
##    74417.08
##    92477.62
##    86504.21
##    77741.31
##    75016.75
##    91932.47
##    86173.22
##    77920.57
##    74826.65
##    92906.51
##    86210.41
##    78989.44
##    75945.57
##    90590.67
##    84051.09
##    76202.73
##    75042.85
##    88751.96
##    83744.32
##    75961.50
##    74228.74
##    88997.21
##    83592.71
##    75878.43
##    73777.63
##    90294.58
##    84568.42
##    76962.52
##    74438.01
##    87867.38
##    81711.65
##    75508.35
##    74258.10
##    88387.79
##    82708.90
##    75345.38
##    73840.68
##    88393.32
##    83493.57
##    76357.56
##    74767.80
##    90393.85
##    84259.31
##    76968.53
##    74584.94
##    87178.92
##    81284.50
##    74756.18
##    73674.41
##    87213.88
##    82305.41
##    75002.55
##    73820.07
##    86276.38
##    81719.14
##    75070.02
##    73833.37
##    86575.09
##    81752.39
##    75197.53
##    73977.51
##    95199.61
##    88550.99
##    79398.94
##    77601.31
##    96689.87
##    89927.70
##    80370.90
##    77953.84
##    95882.10
##    89186.36
##    80871.32
##    77369.85
##    95407.70
##    89446.90
##    80310.99
##    77045.25
##    93040.58
##    85208.87
##    78368.32
##    76544.96
##    93419.08
##    87871.70
##    78119.97
##    77068.28
##    92658.94
##    86683.88
##    79650.81
##    77017.74
##    92916.27
##    87552.95
##    80319.80
##    77612.73
##    90045.16
##    84401.79
##    76245.75
##    76151.01
##    90615.80
##    84504.63
##    76603.61
##    75934.83
##    89733.90
##    85020.46
##    77884.48
##    76863.98
##    90857.42
##    86107.73
##    78274.06
##    76560.07
##    87694.86
##    82217.51
##    76321.47
##    75836.59
##    86962.16
##    81488.26
##    75919.06
##    75753.82
##    88351.59
##    83028.92
##    76613.55
##    75638.85
##    89257.68
##    85120.34
##    77637.14
##    76261.92
##    86599.35
##    81045.57
##    76536.21
##    76445.80
##    87467.54
##    82175.83
##    76865.02
##    76158.62
##    87129.89
##    80861.12
##    75350.28
##    74961.45
##    88263.95
##    82596.85
##    77407.11
##    76123.23
##    87682.44
##    82187.75
##    76913.50
##    76407.36
##    86957.60
##    82012.60
##    76892.46
##    76929.77
##    87378.82
##    82624.70
##    77811.30
##    77276.32
##    86637.72
##    82114.83
##    77081.52
##    76330.14
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were n.trees = 500, interaction.depth =
##  8, shrinkage = 0.1 and n.minobsinnode = 10.
plot(boost_tg, main = "5-Fold Cross Validation Gradient Boosting: tuneGrid")

boost_tg_best <- boost_tg$bestTune
boost_tg_best
##    n.trees interaction.depth shrinkage n.minobsinnode
## 64     500                 8       0.1             10

Berdasarkan output di atas, tuning parameter terbaik adalah n.trees = 500, interaction.depth = 8 dan shrinkage = 0.1 dan n.minobsinnode = 10, yang memberikan CV RMSE = 148799.1, R-squared = 0.9633404, dan MAE = 75014.08.

Re-Fit Model Menggunakan Tuning Parameter Terbaik

boost_tg <- train(selling_price ~., 
               data=trainData, 
               method="gbm",
               tuneGrid  = boost_tg_best,
               verbose = FALSE)
boost_tg
## Stochastic Gradient Boosting 
## 
## 5625 samples
##   11 predictor
## 
## No pre-processing
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 5625, 5625, 5625, 5625, 5625, 5625, ... 
## Resampling results:
## 
##   RMSE      Rsquared   MAE     
##   169333.9  0.9549749  77600.37
## 
## Tuning parameter 'n.trees' was held constant at a value of 500
## Tuning
## 
## Tuning parameter 'shrinkage' was held constant at a value of 0.1
## 
## Tuning parameter 'n.minobsinnode' was held constant at a value of 10
boost_tg_result <- boost_tg$results
boost_tg_result
##   n.trees interaction.depth shrinkage n.minobsinnode     RMSE  Rsquared
## 1     500                 8       0.1             10 169333.9 0.9549749
##        MAE   RMSESD RsquaredSD    MAESD
## 1 77600.37 31811.25 0.01549969 3115.737

Diperoleh model dengan RMSE = 169333.9, R-Squared = 0.9549749, MAE = 77600.37.

Evaluasi Terhadap Data Test

boost_tg_eval <- eval_test_data(boost_tg)
boost_tg_eval
##           RMSE      R_Squared            MAE 
## 127495.2728822      0.9763908  72281.3991466

Diperoleh model dengan RMSE = 127495.2728822, R-Squared = 0.9763908, MAE = 72281.3991466.

Variable Importance

plot(varImp(boost_tg),
     main = "Gradient Boosting Variable Importance" )

Berdasarkan output di atas, tiga peubah terpenting adalah max_power, year, dan torque.

Komparasi Model

Ada empat model yang terbentuk. Untuk memilih model terbaik, dilakukan perbandingan sebagai berikut:

eval_all <- matrix(c(rf_eval, rf_tg_eval, boost_eval, boost_tg_eval), nrow = 4, byrow = T)
colnames(eval_all) <- names(rf_eval)
row.names(eval_all) <- c("Random Forest", 
                         "Random Forest tuneGrid", 
                         "Gradient Boosting", 
                         "Gradient Boosting tuneGrid")
eval_all
##                                RMSE R_Squared      MAE
## Random Forest              133429.7 0.9736961 68359.06
## Random Forest tuneGrid     132655.3 0.9740354 68812.60
## Gradient Boosting          128387.0 0.9760126 73114.29
## Gradient Boosting tuneGrid 127495.3 0.9763908 72281.40

Berdasarkan output di atas, kedua model gradient boosting mempunyai kinerja lebih baik terhadap data testing (RMSE terkecil dan R-Squared terbesar). Selain itu, gradient boosting dengan tunegrid juga lebih baik. Karena itu dipilih model Gradient Boosting dengan tuneGrid sebagai model terbaik, dengan RMSE = 127495.3, dan R-Squared = 0.9763908.

(Opsional) Perbandingan kinerja model juga dapat dilakukan menggunakan resample sebagai berikut:

modelcompare <- resamples(list(random_forest=rf, 
                               gradient_boosting=boost,
                               random_forest_tuneGrid=rf_tg, 
                               gradient_boosting_tuneGrid=boost_tg))
summary(modelcompare)
## 
## Call:
## summary.resamples(object = modelcompare)
## 
## Models: random_forest, gradient_boosting, random_forest_tuneGrid, gradient_boosting_tuneGrid 
## Number of resamples: 25 
## 
## MAE 
##                                Min.  1st Qu.   Median     Mean  3rd Qu.
## random_forest              70805.76 74631.68 76524.96 76479.84 78418.66
## gradient_boosting          71923.38 75959.51 76620.00 77077.11 78038.03
## random_forest_tuneGrid     71166.12 74726.01 75879.77 76110.00 77942.42
## gradient_boosting_tuneGrid 73127.07 74937.24 78274.57 77600.37 79139.11
##                                Max. NA's
## random_forest              82058.86    0
## gradient_boosting          82069.33    0
## random_forest_tuneGrid     85173.11    0
## gradient_boosting_tuneGrid 85689.09    0
## 
## RMSE 
##                                Min.  1st Qu.   Median     Mean  3rd Qu.
## random_forest              128820.9 148291.0 172351.8 170626.1 194107.7
## gradient_boosting          118795.5 149125.6 174554.9 170019.4 188899.4
## random_forest_tuneGrid     125802.8 156065.8 169791.8 170450.7 185236.2
## gradient_boosting_tuneGrid 124809.7 142504.7 154911.8 169333.9 199548.0
##                                Max. NA's
## random_forest              221037.0    0
## gradient_boosting          213839.7    0
## random_forest_tuneGrid     227287.2    0
## gradient_boosting_tuneGrid 214464.3    0
## 
## Rsquared 
##                                 Min.   1st Qu.    Median      Mean   3rd Qu.
## random_forest              0.9315854 0.9458145 0.9542018 0.9546539 0.9675315
## gradient_boosting          0.9295258 0.9434964 0.9546076 0.9538613 0.9647091
## random_forest_tuneGrid     0.9310714 0.9494371 0.9575624 0.9553525 0.9617174
## gradient_boosting_tuneGrid 0.9238403 0.9407797 0.9624515 0.9549749 0.9689030
##                                 Max. NA's
## random_forest              0.9731550    0
## gradient_boosting          0.9766467    0
## random_forest_tuneGrid     0.9745691    0
## gradient_boosting_tuneGrid 0.9768688    0
dotplot(modelcompare, main = "Komparasi Model")

Berdasarkan teknik resample, keempat model memberikan kinerja yang relatif sama.

Interpretable Machine Learning

pred <- predict(rf, newdata = testData)
explainer <- lime(x = subset(trainData, select = -c(selling_price)),
                  model = boost)
set.seed(123)
explanation <- explain(x = head(subset(testData, select = -c(selling_price))[pred>750000,],2), 
                       explainer, n_features = 10)

plot_features(explanation)

Misalnya dengan bugdet >750K, pembeli dapat memperoleh mobil bekas yang berumur di bawah 5 tahun, dengan power besar (> 101), torsi besar (>110) dengan total km rendah (<35000Km).

set.seed(123)
explanation <- explain(x = head(subset(testData, select = -c(selling_price))[pred<150000,],2), 
                       explainer, n_features = 10)

plot_features(explanation)

Sebaliknya, dengan bugdet <150K, pembeli dapat memperoleh mobil bekas yang berumur di diatas 11 tahun, dengan power kecil bertransmisi manual, dengan total jarak tempuh >60000km.

Kesimpulan

  • Random forest dan gradient boosting memberikan kinerja yang relatif sama untuk studi kasus ini.

  • Variabel terpenting: max_power, age, torque, km_driven.

    • max_power atau power mesin maksimum (bhp): semakin tinggi maka semakin mahal
    • age atau umur kendaraan: semakin lama, maka semakin murah
    • torque atau torsi maksimal: semakin tinggi, semakin mahal
    • km_driven atau jarak yang sudah ditempuh: semakin jauh, semakin murah

  1. NIM: G1501211024. Email: ↩︎

  2. NIM: G1501211061. Email: ↩︎

  3. NIM: G14180064. Email: ↩︎

  4. NIM: G94180016. Email: ↩︎