Studi Kasus: Prediksi Tingkat Kesegaran Bayam

Sebagai ilustrasi, modul ini menggunakan data yang digunakan oleh Koyama et al. (2021). Data diperoleh dari foto daun bayam, setelah dilakukan preprocessing, warna pada foto dapat digunakan sebagai input data pada pemodelan prediktif.

Pada ilustrasi ini, peubah respon disimpan pada kolom dengan judul label (1=bayam segar, 0=tidak segar). Sedangkan sisa kolom pada data set merupakan peubah prediktor.

Berikut adalah proses impor data menggunakan fungsi read.csv(). Data terlebih dulu dibagi menjadi dua, yaitu data training dan data testing. Pertama-tama, kita akan mengimpor data training.

spinach.train<-read.csv("https://github.com/raoy/STA514/raw/main/train_feature%20v.2.csv")
dim(spinach.train)
[1] 836 231
head(spinach.train)
spinach.test<-read.csv("https://github.com/raoy/STA514/raw/main/test_feature%20v.2.csv")
dim(spinach.test)
[1] 209 231
head(spinach.test)

Persiapan package

Terdapat beberapa package yang perlu diinstall terlebih dulu agar dapat menjalankan fungsi-fungsi yang ada pada modul ini. Hal ini dapat dilakukan dengan menggunakan fungsi install.packages().

install.packages("caret")
install.packages("ggplot2")
install.packages("rpart.plot")
install.packages("fastAdaboost")

Setelah package diinstall, kita perlu memanggil package tersebut, dapat menggunakan fungsi library() atau require().

library(caret)
Loading required package: ggplot2
Loading required package: lattice
Registered S3 method overwritten by 'data.table':
  method           from
  print.data.table     
library(ggplot2)
library(rpart.plot)
Loading required package: rpart

Sebelum memulai analisis, sebaiknya dipastikan bahwa format data sudah sesuai. Pada ilustrasi ini, kita akan membuat tipe objek peubah respon menjadi faktor, dengan label “segar” dan “tidak segar”.

spinach.train$label<-as.factor(spinach.train$label)
levels(spinach.train$label)<-c("No", "Fresh")

Hal serupa juga dilakukan untuk data testing.

spinach.test$label<-as.factor(spinach.test$label)
levels(spinach.test$label)<-c("No", "Fresh")

Pada data training, terlihat bahwa komposisi data bayam segar dan tidak segar cukup seimbang.

prop.table(table(spinach.train$label))

       No     Fresh 
0.5215311 0.4784689 

Pada ilustrasi ini, dilakukan standarisasi data sebagai bagian dari praproses data.

standardize <- preProcess(spinach.train, method = c("center", "scale"))
s.train <- predict(standardize, newdata = spinach.train)
s.test <- predict(standardize, newdata = spinach.test)

Berikut ini adalah pengaturan agar kita nantinya dapat mengevaluasi model menggunakan ROC cuve.

ctrl<-trainControl(method="cv", summaryFunction=twoClassSummary, classProbs=T,
                     savePredictions = T)

Classification Tree

cart <- train(label ~ ., data = s.train, 
                method ='rpart',
                trControl=ctrl)
Warning in train.default(x, y, weights = w, ...) :
  The metric "Accuracy" was not in the result set. ROC will be used instead.
cart
CART 

836 samples
230 predictors
  2 classes: 'No', 'Fresh' 

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 752, 753, 752, 752, 753, 752, ... 
Resampling results across tuning parameters:

  cp       ROC        Sens       Spec  
  0.04000  0.7395976  0.6856237  0.7525
  0.04125  0.7324954  0.6992600  0.7275
  0.43500  0.5936311  0.8622622  0.3250

ROC was used to select the optimal model using the largest value.
The final value used for the model was cp = 0.04.

Peubah prediktor pada ilustrasi ini merupakan hasil konversi warna pada foto menjadi suatu nilai sehingga kita tidak berusaha untuk menginterpretasikannya. Namun demikian, pada kasus lain yang lebih relevan, Anda dapat menampilkan diagram pohon dari metode CART dengan fungsi rpart.plot().

rpart.plot(cart$finalModel)

Selain itu, dapat pula diperoleh tingkat kepentingan peubah. Sekali lagi, pada kasus ini, kita tidak berusaha memaknai kepentingan peubah, namun Anda mungkin memerlukannya pada kasus lain yang relevan.

plot(varImp(cart, scale = FALSE), top=10)

pred<-predict(cart, s.test)
confusionMatrix(pred, s.test$label, positive = "Fresh")
Confusion Matrix and Statistics

          Reference
Prediction No Fresh
     No    76    17
     Fresh 33    83
                                          
               Accuracy : 0.7608          
                 95% CI : (0.6971, 0.8169)
    No Information Rate : 0.5215          
    P-Value [Acc > NIR] : 9.293e-13       
                                          
                  Kappa : 0.5238          
                                          
 Mcnemar's Test P-Value : 0.03389         
                                          
            Sensitivity : 0.8300          
            Specificity : 0.6972          
         Pos Pred Value : 0.7155          
         Neg Pred Value : 0.8172          
             Prevalence : 0.4785          
         Detection Rate : 0.3971          
   Detection Prevalence : 0.5550          
      Balanced Accuracy : 0.7636          
                                          
       'Positive' Class : Fresh           
                                          
MLeval::evalm(cart, plots="r", silent=TRUE)
$roc

$proc

$prg

$cc

$probs
$probs$`Group 1`


$optres
$optres$`Group 1`


$stdres
$stdres$`Group 1`
NANA

Random Forest

set.seed(1)
rf <- train(label ~ ., data=s.train, 
            method='ranger', 
            trControl=ctrl,
            importance = 'impurity')
Warning in train.default(x, y, weights = w, ...) :
  The metric "Accuracy" was not in the result set. ROC will be used instead.
rf
Random Forest 

836 samples
230 predictors
  2 classes: 'No', 'Fresh' 

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 752, 753, 752, 753, 753, 752, ... 
Resampling results across tuning parameters:

  mtry  splitrule   ROC        Sens       Spec  
    2   gini        0.8720540  0.7797040  0.8000
    2   extratrees  0.8698018  0.7797569  0.8150
  116   gini        0.8898850  0.7818710  0.8225
  116   extratrees  0.8900086  0.7913319  0.8025
  230   gini        0.8836430  0.7680761  0.8175
  230   extratrees  0.8946816  0.7934461  0.8225

Tuning parameter 'min.node.size' was held constant at a value of 1
ROC was used to select the optimal model using the largest value.
The final values used for the model were mtry = 230, splitrule = extratrees
 and min.node.size = 1.
pred<-predict(rf, s.test)
confusionMatrix(pred, s.test$label, positive = "Fresh")
Confusion Matrix and Statistics

          Reference
Prediction No Fresh
     No    90    20
     Fresh 19    80
                                          
               Accuracy : 0.8134          
                 95% CI : (0.7539, 0.8638)
    No Information Rate : 0.5215          
    P-Value [Acc > NIR] : <2e-16          
                                          
                  Kappa : 0.6259          
                                          
 Mcnemar's Test P-Value : 1               
                                          
            Sensitivity : 0.8000          
            Specificity : 0.8257          
         Pos Pred Value : 0.8081          
         Neg Pred Value : 0.8182          
             Prevalence : 0.4785          
         Detection Rate : 0.3828          
   Detection Prevalence : 0.4737          
      Balanced Accuracy : 0.8128          
                                          
       'Positive' Class : Fresh           
                                          
MLeval::evalm(rf, plots="r", silent=TRUE)
$roc

$proc

$prg

$cc

$probs
$probs$`Group 1`


$optres
$optres$`Group 1`


$stdres
$stdres$`Group 1`
NANA

Boosting

set.seed(1)
adaboost <- train(label ~ ., data=s.train, 
                  method='adaboost',
                  trControl=ctrl)
adaboost
pred<-predict(adaboost, s.test)
confusionMatrix(pred, s.test$label, positive = "Fresh")

Support Vector Machine (SVM)

set.seed(1)
svm <- train(label ~ ., data=s.train, 
            method='svmRadial', 
            metric = "F",
            trControl=ctrl)
svm
pred<-predict(svm, s.test)
confusionMatrix(pred, s.test$label, positive = "Fresh")

REFERENCES


  1. ↩︎

LS0tDQp0aXRsZTogIlBlbW9kZWxhbiBLbGFzaWZpa2FzaSBNZW5nZ3VuYWthbiBSIFNvZnR3YXJlIg0KZGF0ZTogRGVjZW1iZXIgNSwgMjAyMQ0KYXV0aG9yOiBSYWhtYSBBbmlzYV5bcmFobWFhbmlzYUBhcHBzLmlwYi5hYy5pZF0NCm91dHB1dDogDQogIGh0bWxfbm90ZWJvb2s6DQogIHRvYzogVFJVRQ0KLS0tDQoNCg0KIyBTdHVkaSBLYXN1czogUHJlZGlrc2kgVGluZ2thdCBLZXNlZ2FyYW4gQmF5YW0NCg0KU2ViYWdhaSBpbHVzdHJhc2ksIG1vZHVsIGluaSBtZW5nZ3VuYWthbiBkYXRhIHlhbmcgZGlndW5ha2FuIG9sZWggS295YW1hIGV0IGFsLiAoMjAyMSkuIERhdGEgZGlwZXJvbGVoIGRhcmkgZm90byBkYXVuIGJheWFtLCBzZXRlbGFoIGRpbGFrdWthbiBwcmVwcm9jZXNzaW5nLCB3YXJuYSBwYWRhIGZvdG8gZGFwYXQgZGlndW5ha2FuIHNlYmFnYWkgaW5wdXQgZGF0YSBwYWRhIHBlbW9kZWxhbiBwcmVkaWt0aWYuIA0KDQpQYWRhIGlsdXN0cmFzaSBpbmksIHBldWJhaCByZXNwb24gZGlzaW1wYW4gcGFkYSBrb2xvbSBkZW5nYW4ganVkdWwgYGxhYmVsYCAoMT1iYXlhbSBzZWdhciwgMD10aWRhayBzZWdhcikuIFNlZGFuZ2thbiBzaXNhIGtvbG9tIHBhZGEgZGF0YSBzZXQgbWVydXBha2FuIHBldWJhaCBwcmVkaWt0b3IuDQoNCkJlcmlrdXQgYWRhbGFoIHByb3NlcyBpbXBvciBkYXRhIG1lbmdndW5ha2FuIGZ1bmdzaSBgcmVhZC5jc3YoKWAuIERhdGEgdGVybGViaWggZHVsdSBkaWJhZ2kgbWVuamFkaSBkdWEsIHlhaXR1IGRhdGEgdHJhaW5pbmcgZGFuIGRhdGEgdGVzdGluZy4gUGVydGFtYS10YW1hLCBraXRhIGFrYW4gbWVuZ2ltcG9yIGRhdGEgdHJhaW5pbmcuDQoNCmBgYHtyfQ0Kc3BpbmFjaC50cmFpbjwtcmVhZC5jc3YoImh0dHBzOi8vZ2l0aHViLmNvbS9yYW95L1NUQTUxNC9yYXcvbWFpbi90cmFpbl9mZWF0dXJlJTIwdi4yLmNzdiIpDQpkaW0oc3BpbmFjaC50cmFpbikNCmhlYWQoc3BpbmFjaC50cmFpbikNCmBgYA0KDQoNCmBgYHtyfQ0Kc3BpbmFjaC50ZXN0PC1yZWFkLmNzdigiaHR0cHM6Ly9naXRodWIuY29tL3Jhb3kvU1RBNTE0L3Jhdy9tYWluL3Rlc3RfZmVhdHVyZSUyMHYuMi5jc3YiKQ0KZGltKHNwaW5hY2gudGVzdCkNCmhlYWQoc3BpbmFjaC50ZXN0KQ0KYGBgDQoNCg0KIyBQZXJzaWFwYW4gcGFja2FnZQ0KDQpUZXJkYXBhdCBiZWJlcmFwYSBwYWNrYWdlIHlhbmcgcGVybHUgZGlpbnN0YWxsIHRlcmxlYmloIGR1bHUgYWdhciBkYXBhdCBtZW5qYWxhbmthbiBmdW5nc2ktZnVuZ3NpIHlhbmcgYWRhIHBhZGEgbW9kdWwgaW5pLiBIYWwgaW5pIGRhcGF0IGRpbGFrdWthbiBkZW5nYW4gbWVuZ2d1bmFrYW4gZnVuZ3NpIGBpbnN0YWxsLnBhY2thZ2VzKClgLg0KDQpgYGB7ciBldmFsPUZBTFNFfQ0KaW5zdGFsbC5wYWNrYWdlcygiY2FyZXQiKQ0KaW5zdGFsbC5wYWNrYWdlcygiZ2dwbG90MiIpDQppbnN0YWxsLnBhY2thZ2VzKCJycGFydC5wbG90IikNCmluc3RhbGwucGFja2FnZXMoImZhc3RBZGFib29zdCIpDQpgYGANCg0KU2V0ZWxhaCBwYWNrYWdlIGRpaW5zdGFsbCwga2l0YSBwZXJsdSBtZW1hbmdnaWwgcGFja2FnZSB0ZXJzZWJ1dCwgZGFwYXQgbWVuZ2d1bmFrYW4gZnVuZ3NpIGBsaWJyYXJ5KClgIGF0YXUgYHJlcXVpcmUoKWAuDQoNCmBgYHtyIG1lc3NhZ2U9RkFMU0UsIHdhcm5pbmc9RkFMU0V9DQpsaWJyYXJ5KGNhcmV0KQ0KbGlicmFyeShnZ3Bsb3QyKQ0KbGlicmFyeShycGFydC5wbG90KQ0KYGBgDQoNClNlYmVsdW0gbWVtdWxhaSBhbmFsaXNpcywgc2ViYWlrbnlhIGRpcGFzdGlrYW4gYmFod2EgZm9ybWF0IGRhdGEgc3VkYWggc2VzdWFpLiBQYWRhIGlsdXN0cmFzaSBpbmksIGtpdGEgYWthbiBtZW1idWF0IHRpcGUgb2JqZWsgcGV1YmFoIHJlc3BvbiBtZW5qYWRpIGZha3RvciwgZGVuZ2FuIGxhYmVsICJzZWdhciIgZGFuICJ0aWRhayBzZWdhciIuDQoNCmBgYHtyfQ0Kc3BpbmFjaC50cmFpbiRsYWJlbDwtYXMuZmFjdG9yKHNwaW5hY2gudHJhaW4kbGFiZWwpDQpsZXZlbHMoc3BpbmFjaC50cmFpbiRsYWJlbCk8LWMoIk5vIiwgIkZyZXNoIikNCmBgYA0KDQpIYWwgc2VydXBhIGp1Z2EgZGlsYWt1a2FuIHVudHVrIGRhdGEgdGVzdGluZy4NCg0KYGBge3J9DQpzcGluYWNoLnRlc3QkbGFiZWw8LWFzLmZhY3RvcihzcGluYWNoLnRlc3QkbGFiZWwpDQpsZXZlbHMoc3BpbmFjaC50ZXN0JGxhYmVsKTwtYygiTm8iLCAiRnJlc2giKQ0KDQpgYGANCg0KUGFkYSBkYXRhIHRyYWluaW5nLCB0ZXJsaWhhdCBiYWh3YSBrb21wb3Npc2kgZGF0YSBiYXlhbSBzZWdhciBkYW4gdGlkYWsgc2VnYXIgY3VrdXAgc2VpbWJhbmcuDQoNCmBgYHtyfQ0KcHJvcC50YWJsZSh0YWJsZShzcGluYWNoLnRyYWluJGxhYmVsKSkNCmBgYA0KDQpQYWRhIGlsdXN0cmFzaSBpbmksIGRpbGFrdWthbiBzdGFuZGFyaXNhc2kgZGF0YSBzZWJhZ2FpIGJhZ2lhbiBkYXJpIHByYXByb3NlcyBkYXRhLg0KDQpgYGB7cn0NCnN0YW5kYXJkaXplIDwtIHByZVByb2Nlc3Moc3BpbmFjaC50cmFpbiwgbWV0aG9kID0gYygiY2VudGVyIiwgInNjYWxlIikpDQpzLnRyYWluIDwtIHByZWRpY3Qoc3RhbmRhcmRpemUsIG5ld2RhdGEgPSBzcGluYWNoLnRyYWluKQ0Kcy50ZXN0IDwtIHByZWRpY3Qoc3RhbmRhcmRpemUsIG5ld2RhdGEgPSBzcGluYWNoLnRlc3QpDQpgYGANCg0KQmVyaWt1dCBpbmkgYWRhbGFoIHBlbmdhdHVyYW4gYWdhciBraXRhIG5hbnRpbnlhIGRhcGF0IG1lbmdldmFsdWFzaSBtb2RlbCBtZW5nZ3VuYWthbiBST0MgY3V2ZS4NCg0KYGBge3J9DQpjdHJsPC10cmFpbkNvbnRyb2wobWV0aG9kPSJjdiIsIHN1bW1hcnlGdW5jdGlvbj10d29DbGFzc1N1bW1hcnksIGNsYXNzUHJvYnM9VCwNCiAgICAgICAgICAgICAgICAgICAgIHNhdmVQcmVkaWN0aW9ucyA9IFQpDQpgYGANCg0KDQojIENsYXNzaWZpY2F0aW9uIFRyZWUNCg0KYGBge3J9DQpjYXJ0IDwtIHRyYWluKGxhYmVsIH4gLiwgZGF0YSA9IHMudHJhaW4sIA0KICAgICAgICAgICAgICAgIG1ldGhvZCA9J3JwYXJ0JywNCiAgICAgICAgICAgICAgICB0ckNvbnRyb2w9Y3RybCkNCmNhcnQNCmBgYA0KDQpQZXViYWggcHJlZGlrdG9yIHBhZGEgaWx1c3RyYXNpIGluaSBtZXJ1cGFrYW4gaGFzaWwga29udmVyc2kgd2FybmEgcGFkYSBmb3RvIG1lbmphZGkgc3VhdHUgbmlsYWkgc2VoaW5nZ2Ega2l0YSB0aWRhayBiZXJ1c2FoYSB1bnR1ayBtZW5naW50ZXJwcmV0YXNpa2FubnlhLiBOYW11biBkZW1pa2lhbiwgcGFkYSBrYXN1cyBsYWluIHlhbmcgbGViaWggcmVsZXZhbiwgQW5kYSBkYXBhdCBtZW5hbXBpbGthbiBkaWFncmFtIHBvaG9uIGRhcmkgbWV0b2RlIENBUlQgZGVuZ2FuIGZ1bmdzaSBgcnBhcnQucGxvdCgpYC4NCg0KYGBge3J9DQpycGFydC5wbG90KGNhcnQkZmluYWxNb2RlbCkNCmBgYA0KDQpTZWxhaW4gaXR1LCBkYXBhdCBwdWxhIGRpcGVyb2xlaCB0aW5na2F0IGtlcGVudGluZ2FuIHBldWJhaC4gU2VrYWxpIGxhZ2ksIHBhZGEga2FzdXMgaW5pLCBraXRhIHRpZGFrIGJlcnVzYWhhIG1lbWFrbmFpIGtlcGVudGluZ2FuIHBldWJhaCwgbmFtdW4gQW5kYSBtdW5na2luIG1lbWVybHVrYW5ueWEgcGFkYSBrYXN1cyBsYWluIHlhbmcgcmVsZXZhbi4NCg0KYGBge3J9DQpwbG90KHZhckltcChjYXJ0LCBzY2FsZSA9IEZBTFNFKSwgdG9wPTEwKQ0KYGBgDQoNCmBgYHtyfQ0KcHJlZDwtcHJlZGljdChjYXJ0LCBzLnRlc3QpDQpjb25mdXNpb25NYXRyaXgocHJlZCwgcy50ZXN0JGxhYmVsLCBwb3NpdGl2ZSA9ICJGcmVzaCIpDQpgYGANCg0KYGBge3J9DQpNTGV2YWw6OmV2YWxtKGNhcnQsIHBsb3RzPSJyIiwgc2lsZW50PVRSVUUpDQpgYGANCg0KDQojIFJhbmRvbSBGb3Jlc3QNCg0KYGBge3J9DQpzZXQuc2VlZCgxKQ0KcmYgPC0gdHJhaW4obGFiZWwgfiAuLCBkYXRhPXMudHJhaW4sIA0KICAgICAgICAgICAgbWV0aG9kPSdyYW5nZXInLCANCiAgICAgICAgICAgIHRyQ29udHJvbD1jdHJsLA0KICAgICAgICAgICAgaW1wb3J0YW5jZSA9ICdpbXB1cml0eScpDQpyZg0KYGBgDQoNCg0KYGBge3J9DQpwcmVkPC1wcmVkaWN0KHJmLCBzLnRlc3QpDQpjb25mdXNpb25NYXRyaXgocHJlZCwgcy50ZXN0JGxhYmVsLCBwb3NpdGl2ZSA9ICJGcmVzaCIpDQpgYGANCg0KYGBge3J9DQpNTGV2YWw6OmV2YWxtKHJmLCBwbG90cz0iciIsIHNpbGVudD1UUlVFKQ0KYGBgDQoNCg0KDQojIEJvb3N0aW5nDQoNCmBgYHtyIGV2YWw9RkFMU0V9DQpzZXQuc2VlZCgxKQ0KYWRhYm9vc3QgPC0gdHJhaW4obGFiZWwgfiAuLCBkYXRhPXMudHJhaW4sIA0KICAgICAgICAgICAgICAgICAgbWV0aG9kPSdhZGFib29zdCcsDQogICAgICAgICAgICAgICAgICB0ckNvbnRyb2w9Y3RybCkNCmFkYWJvb3N0DQpgYGANCg0KYGBge3IgZXZhbD1GQUxTRX0NCnByZWQ8LXByZWRpY3QoYWRhYm9vc3QsIHMudGVzdCkNCmNvbmZ1c2lvbk1hdHJpeChwcmVkLCBzLnRlc3QkbGFiZWwsIHBvc2l0aXZlID0gIkZyZXNoIikNCmBgYA0KDQoNCg0KIyBTdXBwb3J0IFZlY3RvciBNYWNoaW5lIChTVk0pDQoNCmBgYHtyIGV2YWw9RkFMU0V9DQpzZXQuc2VlZCgxKQ0Kc3ZtIDwtIHRyYWluKGxhYmVsIH4gLiwgZGF0YT1zLnRyYWluLCANCiAgICAgICAgICAgIG1ldGhvZD0nc3ZtUmFkaWFsJywgDQogICAgICAgICAgICBtZXRyaWMgPSAiRiIsDQogICAgICAgICAgICB0ckNvbnRyb2w9Y3RybCkNCnN2bQ0KYGBgDQoNCg0KYGBge3IgZXZhbD1GQUxTRX0NCnByZWQ8LXByZWRpY3Qoc3ZtLCBzLnRlc3QpDQpjb25mdXNpb25NYXRyaXgocHJlZCwgcy50ZXN0JGxhYmVsLCBwb3NpdGl2ZSA9ICJGcmVzaCIpDQpgYGANCg0KDQotLS0NCiMgUkVGRVJFTkNFUw0KDQorIEtveWFtYSwgSy4sIFRhbmFrYSwgTS4sIENobywgQi4gSC4sIFlvc2hpa2F3YSwgWS4sICYgS29zZWtpLCBTLiAoMjAyMSkuIFByZWRpY3Rpbmcgc2Vuc29yeSBldmFsdWF0aW9uIG9mIHNwaW5hY2ggZnJlc2huZXNzIHVzaW5nIG1hY2hpbmUgbGVhcm5pbmcgbW9kZWwgYW5kIGRpZ2l0YWwgaW1hZ2VzLiBQbG9zIG9uZSwgMTYoMyksIGUwMjQ4NzY5Lg0KDQorIFNldGlhYnVkaSwgTi4gQS4sIEZha2hydWRpbiwgTi4sICYgR2hvbmksIE4uICgyMDIxLCBOb3ZlbWJlciAyNikuIEtsYXNpZmlrYXNpOiBTdHVkaSBLYXN1cyBQcmVkaWtzaSByZWNoYXJnZS4gUlB1YnMuIGh0dHBzOi8vcnB1YnMuY29tL251cmFuZGkvc3RhNTgxLXAxLWs4DQoNCg0KDQoNCg0KDQo=