This is a second exercise of classification, by using the same data as the previous classification exercise (airline survey from kaggle.com). This exercise consists of 4 parts: Part A. prediction using Naive Bayes, Part B. Decision Tree, Part C. Random Forest, and Part D. Analysis

Goal: to predict customer satisfaction of airline services

PART A: NAIVE BAYES

1. Data preparation

Read data

library(dplyr)

## Warning: package 'dplyr' was built under R version 4.2.2

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

airline <- read.csv("Airline_survey.csv")
head(airline)

dim(airline)

## [1] 129880     23

str(airline)

## 'data.frame':    129880 obs. of  23 variables:
##  $ satisfaction                     : chr  "satisfied" "satisfied" "satisfied" "satisfied" ...
##  $ Gender                           : chr  "Female" "Male" "Female" "Female" ...
##  $ Customer.Type                    : chr  "Loyal Customer" "Loyal Customer" "Loyal Customer" "Loyal Customer" ...
##  $ Age                              : int  65 47 15 60 70 30 66 10 56 22 ...
##  $ Type.of.Travel                   : chr  "Personal Travel" "Personal Travel" "Personal Travel" "Personal Travel" ...
##  $ Class                            : chr  "Eco" "Business" "Eco" "Eco" ...
##  $ Flight.Distance                  : int  265 2464 2138 623 354 1894 227 1812 73 1556 ...
##  $ Seat.comfort                     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Departure.Arrival.time.convenient: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Food.and.drink                   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Gate.location                    : int  2 3 3 3 3 3 3 3 3 3 ...
##  $ Inflight.wifi.service            : int  2 0 2 3 4 2 2 2 5 2 ...
##  $ Inflight.entertainment           : int  4 2 0 4 3 0 5 0 3 0 ...
##  $ Online.support                   : int  2 2 2 3 4 2 5 2 5 2 ...
##  $ Ease.of.Online.booking           : int  3 3 2 1 2 2 5 2 4 2 ...
##  $ On.board.service                 : int  3 4 3 1 2 5 5 3 4 2 ...
##  $ Leg.room.service                 : int  0 4 3 0 0 4 0 3 0 4 ...
##  $ Baggage.handling                 : int  3 4 4 1 2 5 5 4 1 5 ...
##  $ Checkin.service                  : int  5 2 4 4 4 5 5 5 5 3 ...
##  $ Cleanliness                      : int  3 3 4 1 2 4 5 4 4 4 ...
##  $ Online.boarding                  : int  2 2 2 3 5 2 3 2 4 2 ...
##  $ Departure.Delay.in.Minutes       : int  0 310 0 0 0 0 17 0 0 30 ...
##  $ Arrival.Delay.in.Minutes         : int  0 305 0 0 0 0 15 0 0 26 ...

Selanjutnya, perlu dilakukan data wrangling untuk mengubah tipe data menjadi factor, karena data-data tersebut memiliki tipe kategorikal yang merupakan hasil survey terhadap respondents. Semua variabel diubah menjadi factor kecuali age, Flight.Distance, Departure.Delay.in.Minutes dan Arrival.Delay.in.Minutes.

2. Data wrangling

airline <- airline %>% 
  mutate(satisfaction = as.factor(satisfaction),
         Gender = as.factor(Gender),
         Customer.Type = as.factor(Customer.Type),
         Type.of.Travel = as.factor(Type.of.Travel),
         Class = as.factor(Class),
         Seat.comfort = as.factor(Seat.comfort),
         Departure.Arrival.time.convenient = as.factor(Departure.Arrival.time.convenient),
         Food.and.drink  = as.factor(Food.and.drink),
         Gate.location = as.factor(Gate.location),
         Inflight.wifi.service = as.factor(Inflight.wifi.service),
         Inflight.entertainment  = as.factor(Inflight.entertainment),
         Online.support = as.factor(Online.support),
         Ease.of.Online.booking = as.factor(Ease.of.Online.booking),
         On.board.service = as.factor(On.board.service),
         Leg.room.service = as.factor(Leg.room.service),
         Baggage.handling = as.factor(Baggage.handling),
         Checkin.service  = as.factor(Checkin.service),
         Cleanliness = as.factor(Cleanliness),
         Online.boarding = as.factor(Online.boarding))

glimpse(airline)

## Rows: 129,880
## Columns: 23
## $ satisfaction                      <fct> satisfied, satisfied, satisfied, sat…
## $ Gender                            <fct> Female, Male, Female, Female, Female…
## $ Customer.Type                     <fct> Loyal Customer, Loyal Customer, Loya…
## $ Age                               <int> 65, 47, 15, 60, 70, 30, 66, 10, 56, …
## $ Type.of.Travel                    <fct> Personal Travel, Personal Travel, Pe…
## $ Class                             <fct> Eco, Business, Eco, Eco, Eco, Eco, E…
## $ Flight.Distance                   <int> 265, 2464, 2138, 623, 354, 1894, 227…
## $ Seat.comfort                      <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ Departure.Arrival.time.convenient <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ Food.and.drink                    <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ Gate.location                     <fct> 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, …
## $ Inflight.wifi.service             <fct> 2, 0, 2, 3, 4, 2, 2, 2, 5, 2, 3, 2, …
## $ Inflight.entertainment            <fct> 4, 2, 0, 4, 3, 0, 5, 0, 3, 0, 3, 0, …
## $ Online.support                    <fct> 2, 2, 2, 3, 4, 2, 5, 2, 5, 2, 3, 2, …
## $ Ease.of.Online.booking            <fct> 3, 3, 2, 1, 2, 2, 5, 2, 4, 2, 3, 2, …
## $ On.board.service                  <fct> 3, 4, 3, 1, 2, 5, 5, 3, 4, 2, 3, 3, …
## $ Leg.room.service                  <fct> 0, 4, 3, 0, 0, 4, 0, 3, 0, 4, 0, 2, …
## $ Baggage.handling                  <fct> 3, 4, 4, 1, 2, 5, 5, 4, 1, 5, 1, 5, …
## $ Checkin.service                   <fct> 5, 2, 4, 4, 4, 5, 5, 5, 5, 3, 2, 2, …
## $ Cleanliness                       <fct> 3, 3, 4, 1, 2, 4, 5, 4, 4, 4, 3, 5, …
## $ Online.boarding                   <fct> 2, 2, 2, 3, 5, 2, 3, 2, 4, 2, 5, 2, …
## $ Departure.Delay.in.Minutes        <int> 0, 310, 0, 0, 0, 0, 17, 0, 0, 30, 47…
## $ Arrival.Delay.in.Minutes          <int> 0, 305, 0, 0, 0, 0, 15, 0, 0, 26, 48…

3. Exploratory data analysis

Check missing value

anyNA(airline)

## [1] TRUE

is.na(airline)%>% colSums()

##                      satisfaction                            Gender 
##                                 0                                 0 
##                     Customer.Type                               Age 
##                                 0                                 0 
##                    Type.of.Travel                             Class 
##                                 0                                 0 
##                   Flight.Distance                      Seat.comfort 
##                                 0                                 0 
## Departure.Arrival.time.convenient                    Food.and.drink 
##                                 0                                 0 
##                     Gate.location             Inflight.wifi.service 
##                                 0                                 0 
##            Inflight.entertainment                    Online.support 
##                                 0                                 0 
##            Ease.of.Online.booking                  On.board.service 
##                                 0                                 0 
##                  Leg.room.service                  Baggage.handling 
##                                 0                                 0 
##                   Checkin.service                       Cleanliness 
##                                 0                                 0 
##                   Online.boarding        Departure.Delay.in.Minutes 
##                                 0                                 0 
##          Arrival.Delay.in.Minutes 
##                               393

4. Data pre-processing

isi missing value dengan mean

airline$Arrival.Delay.in.Minutes[is.na(airline$Arrival.Delay.in.Minutes)]=
  mean(airline$Arrival.Delay.in.Minutes, na.rm = TRUE)

pastikan lagi bahwa tidak ada missing value

anyNA(airline)

## [1] FALSE

is.na(airline)%>% colSums()

##                      satisfaction                            Gender 
##                                 0                                 0 
##                     Customer.Type                               Age 
##                                 0                                 0 
##                    Type.of.Travel                             Class 
##                                 0                                 0 
##                   Flight.Distance                      Seat.comfort 
##                                 0                                 0 
## Departure.Arrival.time.convenient                    Food.and.drink 
##                                 0                                 0 
##                     Gate.location             Inflight.wifi.service 
##                                 0                                 0 
##            Inflight.entertainment                    Online.support 
##                                 0                                 0 
##            Ease.of.Online.booking                  On.board.service 
##                                 0                                 0 
##                  Leg.room.service                  Baggage.handling 
##                                 0                                 0 
##                   Checkin.service                       Cleanliness 
##                                 0                                 0 
##                   Online.boarding        Departure.Delay.in.Minutes 
##                                 0                                 0 
##          Arrival.Delay.in.Minutes 
##                                 0

Check class imbalance

table(airline$satisfaction)

## 
## dissatisfied    satisfied 
##        58793        71087

prop.table(table(airline$satisfaction))

## 
## dissatisfied    satisfied 
##    0.4526717    0.5473283

tidak ada class imbalance pada kolom target

5. Cross Validation

Memisahkan antara data train dan data test

RNGkind(sample.kind = "Rounding")

## Warning in RNGkind(sample.kind = "Rounding"): non-uniform 'Rounding' sampler
## used

set.seed(100)
# your code here
index <- sample(nrow(airline), nrow(airline)*0.8)
airline_train <- airline[index,]
airline_test <- airline[-index,]

nrow(airline)*0.8

## [1] 103904

nrow(airline_train)

## [1] 103904

nrow(airline)*0.2

## [1] 25976

nrow(airline_test)

## [1] 25976

check class imbalance pada kolom target pada data train

table(airline_train$satisfaction)

## 
## dissatisfied    satisfied 
##        47100        56804

prop.table(table(airline_train$satisfaction))

## 
## dissatisfied    satisfied 
##     0.453303     0.546697

tidak terjadi class imbalance

6. Prediksi dengan Naive Bayes

library(e1071)

## Warning: package 'e1071' was built under R version 4.2.2

model_airline_naive <- naiveBayes(formula = satisfaction~., data=airline_train, laplace = 1)
#model_airline_naive

7. Model evaluation

# predict
airline_test$pred_label <- predict(object = model_airline_naive,
                                   newdata = airline_test,
                                 type = "class") #yang dikembalikan label target, kalau raw yang dikembalikan peluang
#airline_test

Evaluasi model dengan confusion matrix:

library(caret)

## Warning: package 'caret' was built under R version 4.2.2

## Loading required package: ggplot2

## Warning: package 'ggplot2' was built under R version 4.2.2

## Loading required package: lattice

# confusion matrix
confusionMatrix(data= airline_test$pred_label,
                reference= airline_test$satisfaction,
                positive = "satisfied")

## Confusion Matrix and Statistics
## 
##               Reference
## Prediction     dissatisfied satisfied
##   dissatisfied         9615      1697
##   satisfied            2078     12586
##                                           
##                Accuracy : 0.8547          
##                  95% CI : (0.8503, 0.8589)
##     No Information Rate : 0.5499          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.7056          
##                                           
##  Mcnemar's Test P-Value : 6.218e-10       
##                                           
##             Sensitivity : 0.8812          
##             Specificity : 0.8223          
##          Pos Pred Value : 0.8583          
##          Neg Pred Value : 0.8500          
##              Prevalence : 0.5499          
##          Detection Rate : 0.4845          
##    Detection Prevalence : 0.5645          
##       Balanced Accuracy : 0.8517          
##                                           
##        'Positive' Class : satisfied       
##

Sensitivity = recall

Pos pred value = precision

Metrics mana yang kita unggulkan?

Precision atau Pos pred value, karena kita ingin meminimalisir false positive

Nilai accuracy 0.8547 apakah sudah cukup baik? Model naive bayes sudah cukup bagus dengan nilai accuracy sebesar 0.8547 dengan catatan kita berasumsi bahwa model yang bagus akurasi-nya 85%

8. ROC dan AUC

ROC adalah kurva yang menggambarkan hubungan antara True Positive Rate (Sensitivity atau Recall) dengan False Positive Rate (1-Specificity) pada setiap threshold. Model yang baik idealnya memiliki True Positive Rate yang tinggi dan False Positive Rate yang rendah. Note: Specificity adalah True Negative Rate.

Mari kita buat kurva ROC dari model model_nb_vote:

# ambil hasil prediksi data test dalam bentuk probability
airline_test$prob <- predict(model_airline_naive, airline_test, type="raw")
#airline_test$prob[,"satisfied"]

# menyiapkan pred vs actual
airline_test$actual <- ifelse(airline_test$satisfaction=="satisfied", 1, 0)
#airline_test$actual

library(ROCR)

## Warning: package 'ROCR' was built under R version 4.2.2

# objek prediction
roc_pred_airline <- prediction(predictions=airline_test$prob[,"satisfied"], 
                       labels=airline_test$actual) #label aktual yang sudah diubah menjadi nilai 0 dan 1

# ROC curve
plot(performance(prediction.obj = roc_pred_airline, 
            measure = "tpr", #axis y
            x.measure = "fpr")) #axis x
abline(0,1, lty=2) #garis diagonal sebagai batas model terburuk dan harus di-run bersamaan dengan vote di atasnya

Nilai AUC

auc_pred <- performance(prediction.obj = roc_pred_airline,
                        measure="auc") 

auc_pred@y.values

## [[1]]
## [1] 0.9309337

Nili AUC = 0.9309, mendekati 1, artinya cukup baik dalam memprediksi kelas positif dan kelas negatif

PART B: Decision Tree

1. Data preparation

Data yang digunakan untuk LBB ini saya ambil dari website Kaggle.com, berjudul Airline survey. Tujuan LBB: untuk melakukan prediksi* terhadap kepuasan pelanggan atas layanan airline.

Read data

library(dplyr)
airline <- read.csv("Airline_survey.csv")
head(airline)

dim(airline)

## [1] 129880     23

str(airline)

## 'data.frame':    129880 obs. of  23 variables:
##  $ satisfaction                     : chr  "satisfied" "satisfied" "satisfied" "satisfied" ...
##  $ Gender                           : chr  "Female" "Male" "Female" "Female" ...
##  $ Customer.Type                    : chr  "Loyal Customer" "Loyal Customer" "Loyal Customer" "Loyal Customer" ...
##  $ Age                              : int  65 47 15 60 70 30 66 10 56 22 ...
##  $ Type.of.Travel                   : chr  "Personal Travel" "Personal Travel" "Personal Travel" "Personal Travel" ...
##  $ Class                            : chr  "Eco" "Business" "Eco" "Eco" ...
##  $ Flight.Distance                  : int  265 2464 2138 623 354 1894 227 1812 73 1556 ...
##  $ Seat.comfort                     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Departure.Arrival.time.convenient: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Food.and.drink                   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Gate.location                    : int  2 3 3 3 3 3 3 3 3 3 ...
##  $ Inflight.wifi.service            : int  2 0 2 3 4 2 2 2 5 2 ...
##  $ Inflight.entertainment           : int  4 2 0 4 3 0 5 0 3 0 ...
##  $ Online.support                   : int  2 2 2 3 4 2 5 2 5 2 ...
##  $ Ease.of.Online.booking           : int  3 3 2 1 2 2 5 2 4 2 ...
##  $ On.board.service                 : int  3 4 3 1 2 5 5 3 4 2 ...
##  $ Leg.room.service                 : int  0 4 3 0 0 4 0 3 0 4 ...
##  $ Baggage.handling                 : int  3 4 4 1 2 5 5 4 1 5 ...
##  $ Checkin.service                  : int  5 2 4 4 4 5 5 5 5 3 ...
##  $ Cleanliness                      : int  3 3 4 1 2 4 5 4 4 4 ...
##  $ Online.boarding                  : int  2 2 2 3 5 2 3 2 4 2 ...
##  $ Departure.Delay.in.Minutes       : int  0 310 0 0 0 0 17 0 0 30 ...
##  $ Arrival.Delay.in.Minutes         : int  0 305 0 0 0 0 15 0 0 26 ...

2. Data wrangling

airline <- airline %>% 
  mutate(satisfaction = as.factor(satisfaction),
         Gender = as.factor(Gender),
         Customer.Type = as.factor(Customer.Type),
         Type.of.Travel = as.factor(Type.of.Travel),
         Class = as.factor(Class),
         Seat.comfort = as.factor(Seat.comfort),
         Departure.Arrival.time.convenient = as.factor(Departure.Arrival.time.convenient),
         Food.and.drink  = as.factor(Food.and.drink),
         Gate.location = as.factor(Gate.location),
         Inflight.wifi.service = as.factor(Inflight.wifi.service),
         Inflight.entertainment  = as.factor(Inflight.entertainment),
         Online.support = as.factor(Online.support),
         Ease.of.Online.booking = as.factor(Ease.of.Online.booking),
         On.board.service = as.factor(On.board.service),
         Leg.room.service = as.factor(Leg.room.service),
         Baggage.handling = as.factor(Baggage.handling),
         Checkin.service  = as.factor(Checkin.service),
         Cleanliness = as.factor(Cleanliness),
         Online.boarding = as.factor(Online.boarding))

glimpse(airline)

## Rows: 129,880
## Columns: 23
## $ satisfaction                      <fct> satisfied, satisfied, satisfied, sat…
## $ Gender                            <fct> Female, Male, Female, Female, Female…
## $ Customer.Type                     <fct> Loyal Customer, Loyal Customer, Loya…
## $ Age                               <int> 65, 47, 15, 60, 70, 30, 66, 10, 56, …
## $ Type.of.Travel                    <fct> Personal Travel, Personal Travel, Pe…
## $ Class                             <fct> Eco, Business, Eco, Eco, Eco, Eco, E…
## $ Flight.Distance                   <int> 265, 2464, 2138, 623, 354, 1894, 227…
## $ Seat.comfort                      <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ Departure.Arrival.time.convenient <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ Food.and.drink                    <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ Gate.location                     <fct> 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, …
## $ Inflight.wifi.service             <fct> 2, 0, 2, 3, 4, 2, 2, 2, 5, 2, 3, 2, …
## $ Inflight.entertainment            <fct> 4, 2, 0, 4, 3, 0, 5, 0, 3, 0, 3, 0, …
## $ Online.support                    <fct> 2, 2, 2, 3, 4, 2, 5, 2, 5, 2, 3, 2, …
## $ Ease.of.Online.booking            <fct> 3, 3, 2, 1, 2, 2, 5, 2, 4, 2, 3, 2, …
## $ On.board.service                  <fct> 3, 4, 3, 1, 2, 5, 5, 3, 4, 2, 3, 3, …
## $ Leg.room.service                  <fct> 0, 4, 3, 0, 0, 4, 0, 3, 0, 4, 0, 2, …
## $ Baggage.handling                  <fct> 3, 4, 4, 1, 2, 5, 5, 4, 1, 5, 1, 5, …
## $ Checkin.service                   <fct> 5, 2, 4, 4, 4, 5, 5, 5, 5, 3, 2, 2, …
## $ Cleanliness                       <fct> 3, 3, 4, 1, 2, 4, 5, 4, 4, 4, 3, 5, …
## $ Online.boarding                   <fct> 2, 2, 2, 3, 5, 2, 3, 2, 4, 2, 5, 2, …
## $ Departure.Delay.in.Minutes        <int> 0, 310, 0, 0, 0, 0, 17, 0, 0, 30, 47…
## $ Arrival.Delay.in.Minutes          <int> 0, 305, 0, 0, 0, 0, 15, 0, 0, 26, 48…

3. Exploratory data analysis

Check missing value

anyNA(airline)

## [1] TRUE

is.na(airline)%>% colSums()

##                      satisfaction                            Gender 
##                                 0                                 0 
##                     Customer.Type                               Age 
##                                 0                                 0 
##                    Type.of.Travel                             Class 
##                                 0                                 0 
##                   Flight.Distance                      Seat.comfort 
##                                 0                                 0 
## Departure.Arrival.time.convenient                    Food.and.drink 
##                                 0                                 0 
##                     Gate.location             Inflight.wifi.service 
##                                 0                                 0 
##            Inflight.entertainment                    Online.support 
##                                 0                                 0 
##            Ease.of.Online.booking                  On.board.service 
##                                 0                                 0 
##                  Leg.room.service                  Baggage.handling 
##                                 0                                 0 
##                   Checkin.service                       Cleanliness 
##                                 0                                 0 
##                   Online.boarding        Departure.Delay.in.Minutes 
##                                 0                                 0 
##          Arrival.Delay.in.Minutes 
##                               393

4. Data pre-processing

isi missing value dengan mean

airline$Arrival.Delay.in.Minutes[is.na(airline$Arrival.Delay.in.Minutes)]=
  mean(airline$Arrival.Delay.in.Minutes, na.rm = TRUE)

pastikan lagi bahwa tidak ada missing value

anyNA(airline)

## [1] FALSE

is.na(airline)%>% colSums()

##                      satisfaction                            Gender 
##                                 0                                 0 
##                     Customer.Type                               Age 
##                                 0                                 0 
##                    Type.of.Travel                             Class 
##                                 0                                 0 
##                   Flight.Distance                      Seat.comfort 
##                                 0                                 0 
## Departure.Arrival.time.convenient                    Food.and.drink 
##                                 0                                 0 
##                     Gate.location             Inflight.wifi.service 
##                                 0                                 0 
##            Inflight.entertainment                    Online.support 
##                                 0                                 0 
##            Ease.of.Online.booking                  On.board.service 
##                                 0                                 0 
##                  Leg.room.service                  Baggage.handling 
##                                 0                                 0 
##                   Checkin.service                       Cleanliness 
##                                 0                                 0 
##                   Online.boarding        Departure.Delay.in.Minutes 
##                                 0                                 0 
##          Arrival.Delay.in.Minutes 
##                                 0

Check class imbalance

table(airline$satisfaction)

## 
## dissatisfied    satisfied 
##        58793        71087

prop.table(table(airline$satisfaction))

## 
## dissatisfied    satisfied 
##    0.4526717    0.5473283

tidak ada class imbalance pada kolom target

5. Cross Validation

Memisahkan antara data train dan data test

RNGkind(sample.kind = "Rounding")

## Warning in RNGkind(sample.kind = "Rounding"): non-uniform 'Rounding' sampler
## used

set.seed(100)
# your code here
index <- sample(nrow(airline), nrow(airline)*0.8)
airline_train <- airline[index,]
airline_test <- airline[-index,]

nrow(airline)*0.8

## [1] 103904

nrow(airline_train)

## [1] 103904

nrow(airline)*0.2

## [1] 25976

nrow(airline_test)

## [1] 25976

check class imbalance pada kolom target pada data train

table(airline_train$satisfaction)

## 
## dissatisfied    satisfied 
##        47100        56804

prop.table(table(airline_train$satisfaction))

## 
## dissatisfied    satisfied 
##     0.453303     0.546697

tidak terjadi class imbalance

6. Klasifikasi dengan Decision Tree

Model Decision Tree

library(partykit)

## Warning: package 'partykit' was built under R version 4.2.2

## Loading required package: grid

## Loading required package: libcoin

## Warning: package 'libcoin' was built under R version 4.2.2

## Loading required package: mvtnorm

dtree_model_airline <- ctree(formula = satisfaction ~.,
                     data = airline_train)
plot(dtree_model_airline, type = "simple")

terlalu rumit, lakukan set percabangan

Evaluasi performa model complex

# prediksi kelas di data test
pred_A <- predict(dtree_model_airline_complex, airline_test, type="response")

# confusion matrix data test
confusionMatrix(pred_A, airline_test$satisfaction, positive = "satisfied")

## Confusion Matrix and Statistics
## 
##               Reference
## Prediction     dissatisfied satisfied
##   dissatisfied        10856      1018
##   satisfied             837     13265
##                                           
##                Accuracy : 0.9286          
##                  95% CI : (0.9254, 0.9317)
##     No Information Rate : 0.5499          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.8559          
##                                           
##  Mcnemar's Test P-Value : 2.924e-05       
##                                           
##             Sensitivity : 0.9287          
##             Specificity : 0.9284          
##          Pos Pred Value : 0.9406          
##          Neg Pred Value : 0.9143          
##              Prevalence : 0.5499          
##          Detection Rate : 0.5107          
##    Detection Prevalence : 0.5429          
##       Balanced Accuracy : 0.9286          
##                                           
##        'Positive' Class : satisfied       
##

Customer satisfaction classification (Naive bayes, DT, RF)

Reni uNisA

2023-05-15

PART A: NAIVE BAYES

1. Data preparation

2. Data wrangling

3. Exploratory data analysis

4. Data pre-processing

5. Cross Validation

6. Prediksi dengan Naive Bayes

7. Model evaluation

8. ROC dan AUC

PART B: Decision Tree

1. Data preparation

2. Data wrangling

3. Exploratory data analysis

4. Data pre-processing

5. Cross Validation

6. Klasifikasi dengan Decision Tree

Model Decision Tree