This exercise consists of 3 parts: Part A. prediction using Logistik Regression, Part B. KNN algorithm, and Part C. analysis.

PART A: LOGISTIC REGRESSION

1. Data preparation

We use data from Kaggle.com, titled Airline survey. We would like to investigate customer satisfaction in using the airlines.

Read data

library(dplyr)
## Warning: package 'dplyr' was built under R version 4.2.2
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
 kepuasan <- read.csv("Airline_survey.csv")
head(kepuasan)
dim(kepuasan)
## [1] 129880     23
str(kepuasan)
## 'data.frame':    129880 obs. of  23 variables:
##  $ satisfaction                     : chr  "satisfied" "satisfied" "satisfied" "satisfied" ...
##  $ Gender                           : chr  "Female" "Male" "Female" "Female" ...
##  $ Customer.Type                    : chr  "Loyal Customer" "Loyal Customer" "Loyal Customer" "Loyal Customer" ...
##  $ Age                              : int  65 47 15 60 70 30 66 10 56 22 ...
##  $ Type.of.Travel                   : chr  "Personal Travel" "Personal Travel" "Personal Travel" "Personal Travel" ...
##  $ Class                            : chr  "Eco" "Business" "Eco" "Eco" ...
##  $ Flight.Distance                  : int  265 2464 2138 623 354 1894 227 1812 73 1556 ...
##  $ Seat.comfort                     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Departure.Arrival.time.convenient: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Food.and.drink                   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Gate.location                    : int  2 3 3 3 3 3 3 3 3 3 ...
##  $ Inflight.wifi.service            : int  2 0 2 3 4 2 2 2 5 2 ...
##  $ Inflight.entertainment           : int  4 2 0 4 3 0 5 0 3 0 ...
##  $ Online.support                   : int  2 2 2 3 4 2 5 2 5 2 ...
##  $ Ease.of.Online.booking           : int  3 3 2 1 2 2 5 2 4 2 ...
##  $ On.board.service                 : int  3 4 3 1 2 5 5 3 4 2 ...
##  $ Leg.room.service                 : int  0 4 3 0 0 4 0 3 0 4 ...
##  $ Baggage.handling                 : int  3 4 4 1 2 5 5 4 1 5 ...
##  $ Checkin.service                  : int  5 2 4 4 4 5 5 5 5 3 ...
##  $ Cleanliness                      : int  3 3 4 1 2 4 5 4 4 4 ...
##  $ Online.boarding                  : int  2 2 2 3 5 2 3 2 4 2 ...
##  $ Departure.Delay.in.Minutes       : int  0 310 0 0 0 0 17 0 0 30 ...
##  $ Arrival.Delay.in.Minutes         : int  0 305 0 0 0 0 15 0 0 26 ...

Deskripsi variabel: (to be updated)

Selanjutnya, perlu dilakukan data wrangling untuk mengubah tipe data menjadi factor, karena data-data tersebut memiliki tipe kategorikal yang merupakan hasil survey terhadap respondents. Semua variabel diubah menjadi factor kecuali age, Flight.Distance, Departure.Delay.in.Minutes dan Arrival.Delay.in.Minutes.

2. Data wrangling

names(kepuasan)
##  [1] "satisfaction"                      "Gender"                           
##  [3] "Customer.Type"                     "Age"                              
##  [5] "Type.of.Travel"                    "Class"                            
##  [7] "Flight.Distance"                   "Seat.comfort"                     
##  [9] "Departure.Arrival.time.convenient" "Food.and.drink"                   
## [11] "Gate.location"                     "Inflight.wifi.service"            
## [13] "Inflight.entertainment"            "Online.support"                   
## [15] "Ease.of.Online.booking"            "On.board.service"                 
## [17] "Leg.room.service"                  "Baggage.handling"                 
## [19] "Checkin.service"                   "Cleanliness"                      
## [21] "Online.boarding"                   "Departure.Delay.in.Minutes"       
## [23] "Arrival.Delay.in.Minutes"
kepuasan <- kepuasan %>% 
  mutate(satisfaction = as.factor(satisfaction),
         Gender = as.factor(Gender),
         Customer.Type = as.factor(Customer.Type),
         Type.of.Travel = as.factor(Type.of.Travel),
         Class = as.factor(Class),
         Seat.comfort = as.factor(Seat.comfort),
         Departure.Arrival.time.convenient = as.factor(Departure.Arrival.time.convenient),
         Food.and.drink  = as.factor(Food.and.drink),
         Gate.location = as.factor(Gate.location),
         Inflight.wifi.service = as.factor(Inflight.wifi.service),
         Inflight.entertainment  = as.factor(Inflight.entertainment),
         Online.support = as.factor(Online.support),
         Ease.of.Online.booking = as.factor(Ease.of.Online.booking),
         On.board.service = as.factor(On.board.service),
         Leg.room.service = as.factor(Leg.room.service),
         Baggage.handling = as.factor(Baggage.handling),
         Checkin.service  = as.factor(Checkin.service),
         Cleanliness = as.factor(Cleanliness),
         Online.boarding = as.factor(Online.boarding))

glimpse(kepuasan)
## Rows: 129,880
## Columns: 23
## $ satisfaction                      <fct> satisfied, satisfied, satisfied, sat…
## $ Gender                            <fct> Female, Male, Female, Female, Female…
## $ Customer.Type                     <fct> Loyal Customer, Loyal Customer, Loya…
## $ Age                               <int> 65, 47, 15, 60, 70, 30, 66, 10, 56, …
## $ Type.of.Travel                    <fct> Personal Travel, Personal Travel, Pe…
## $ Class                             <fct> Eco, Business, Eco, Eco, Eco, Eco, E…
## $ Flight.Distance                   <int> 265, 2464, 2138, 623, 354, 1894, 227…
## $ Seat.comfort                      <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ Departure.Arrival.time.convenient <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ Food.and.drink                    <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ Gate.location                     <fct> 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, …
## $ Inflight.wifi.service             <fct> 2, 0, 2, 3, 4, 2, 2, 2, 5, 2, 3, 2, …
## $ Inflight.entertainment            <fct> 4, 2, 0, 4, 3, 0, 5, 0, 3, 0, 3, 0, …
## $ Online.support                    <fct> 2, 2, 2, 3, 4, 2, 5, 2, 5, 2, 3, 2, …
## $ Ease.of.Online.booking            <fct> 3, 3, 2, 1, 2, 2, 5, 2, 4, 2, 3, 2, …
## $ On.board.service                  <fct> 3, 4, 3, 1, 2, 5, 5, 3, 4, 2, 3, 3, …
## $ Leg.room.service                  <fct> 0, 4, 3, 0, 0, 4, 0, 3, 0, 4, 0, 2, …
## $ Baggage.handling                  <fct> 3, 4, 4, 1, 2, 5, 5, 4, 1, 5, 1, 5, …
## $ Checkin.service                   <fct> 5, 2, 4, 4, 4, 5, 5, 5, 5, 3, 2, 2, …
## $ Cleanliness                       <fct> 3, 3, 4, 1, 2, 4, 5, 4, 4, 4, 3, 5, …
## $ Online.boarding                   <fct> 2, 2, 2, 3, 5, 2, 3, 2, 4, 2, 5, 2, …
## $ Departure.Delay.in.Minutes        <int> 0, 310, 0, 0, 0, 0, 17, 0, 0, 30, 47…
## $ Arrival.Delay.in.Minutes          <int> 0, 305, 0, 0, 0, 0, 15, 0, 0, 26, 48…

3. Exploratory data analysis

Check missing value

anyNA(kepuasan)
## [1] TRUE
is.na(kepuasan)%>% colSums()
##                      satisfaction                            Gender 
##                                 0                                 0 
##                     Customer.Type                               Age 
##                                 0                                 0 
##                    Type.of.Travel                             Class 
##                                 0                                 0 
##                   Flight.Distance                      Seat.comfort 
##                                 0                                 0 
## Departure.Arrival.time.convenient                    Food.and.drink 
##                                 0                                 0 
##                     Gate.location             Inflight.wifi.service 
##                                 0                                 0 
##            Inflight.entertainment                    Online.support 
##                                 0                                 0 
##            Ease.of.Online.booking                  On.board.service 
##                                 0                                 0 
##                  Leg.room.service                  Baggage.handling 
##                                 0                                 0 
##                   Checkin.service                       Cleanliness 
##                                 0                                 0 
##                   Online.boarding        Departure.Delay.in.Minutes 
##                                 0                                 0 
##          Arrival.Delay.in.Minutes 
##                               393

4. Data pre-processing

isi missing value dengan mean

kepuasan$Arrival.Delay.in.Minutes[is.na(kepuasan$Arrival.Delay.in.Minutes)]=
  mean(kepuasan$Arrival.Delay.in.Minutes, na.rm = TRUE)

pastikan lagi bahwa tidak ada missing value

anyNA(kepuasan)
## [1] FALSE
is.na(kepuasan)%>% colSums()
##                      satisfaction                            Gender 
##                                 0                                 0 
##                     Customer.Type                               Age 
##                                 0                                 0 
##                    Type.of.Travel                             Class 
##                                 0                                 0 
##                   Flight.Distance                      Seat.comfort 
##                                 0                                 0 
## Departure.Arrival.time.convenient                    Food.and.drink 
##                                 0                                 0 
##                     Gate.location             Inflight.wifi.service 
##                                 0                                 0 
##            Inflight.entertainment                    Online.support 
##                                 0                                 0 
##            Ease.of.Online.booking                  On.board.service 
##                                 0                                 0 
##                  Leg.room.service                  Baggage.handling 
##                                 0                                 0 
##                   Checkin.service                       Cleanliness 
##                                 0                                 0 
##                   Online.boarding        Departure.Delay.in.Minutes 
##                                 0                                 0 
##          Arrival.Delay.in.Minutes 
##                                 0

Check class imbalance

table(kepuasan$satisfaction)
## 
## dissatisfied    satisfied 
##        58793        71087
prop.table(table(kepuasan$satisfaction))
## 
## dissatisfied    satisfied 
##    0.4526717    0.5473283

tidak ada class imbalance pada kolom target

5. Cross Validation

Memisahkan antara data train dan data test

RNGkind(sample.kind = "Rounding")
## Warning in RNGkind(sample.kind = "Rounding"): non-uniform 'Rounding' sampler
## used
set.seed(100)
# your code here
index <- sample(nrow(kepuasan), nrow(kepuasan)*0.7)
kepuasan_train <- kepuasan[index,]
kepuasan_test <- kepuasan[-index,]
nrow(kepuasan)*0.7
## [1] 90916
nrow(kepuasan_train)
## [1] 90916
nrow(kepuasan)*0.3
## [1] 38964
nrow(kepuasan_test)
## [1] 38964

check class imbalance pada kolom target pada data train

table(kepuasan_train$satisfaction)
## 
## dissatisfied    satisfied 
##        41169        49747
prop.table(table(kepuasan_train$satisfaction))
## 
## dissatisfied    satisfied 
##    0.4528246    0.5471754

tidak terjadi class imbalance

6. Logistic Regression all predictors

# model_logistic <- glm()
model_kepuasan_all <- glm(formula = satisfaction ~., 
                   data = kepuasan_train, 
                   family = "binomial")
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(model_kepuasan_all)
## 
## Call:
## glm(formula = satisfaction ~ ., family = "binomial", data = kepuasan_train)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -4.4453  -0.2996   0.0127   0.2343   3.6532  
## 
## Coefficients: (3 not defined because of singularities)
##                                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                         6.727e+00  8.441e+03   0.001  0.99936    
## GenderMale                         -9.083e-01  2.625e-02 -34.604  < 2e-16 ***
## Customer.TypeLoyal Customer         2.783e+00  4.391e-02  63.382  < 2e-16 ***
## Age                                -4.268e-03  9.088e-04  -4.696 2.65e-06 ***
## Type.of.TravelPersonal Travel      -1.366e+00  3.908e-02 -34.967  < 2e-16 ***
## ClassEco                           -5.384e-01  3.408e-02 -15.802  < 2e-16 ***
## ClassEco Plus                      -6.907e-01  5.296e-02 -13.041  < 2e-16 ***
## Flight.Distance                    -8.687e-05  1.352e-05  -6.426 1.31e-10 ***
## Seat.comfort1                      -2.360e+01  8.822e+01  -0.267  0.78910    
## Seat.comfort2                      -2.398e+01  8.822e+01  -0.272  0.78572    
## Seat.comfort3                      -2.406e+01  8.822e+01  -0.273  0.78508    
## Seat.comfort4                      -2.296e+01  8.822e+01  -0.260  0.79462    
## Seat.comfort5                      -1.805e+01  8.822e+01  -0.205  0.83786    
## Departure.Arrival.time.convenient1  5.209e-01  8.649e-02   6.023 1.72e-09 ***
## Departure.Arrival.time.convenient2  6.251e-01  8.366e-02   7.472 7.92e-14 ***
## Departure.Arrival.time.convenient3  5.981e-01  8.171e-02   7.319 2.50e-13 ***
## Departure.Arrival.time.convenient4 -4.546e-01  7.611e-02  -5.972 2.34e-09 ***
## Departure.Arrival.time.convenient5 -1.572e+00  8.267e-02 -19.014  < 2e-16 ***
## Food.and.drink1                     3.452e+00  6.849e-01   5.040 4.66e-07 ***
## Food.and.drink2                     3.459e+00  6.849e-01   5.050 4.42e-07 ***
## Food.and.drink3                     3.863e+00  6.846e-01   5.643 1.67e-08 ***
## Food.and.drink4                     3.958e+00  6.842e-01   5.784 7.29e-09 ***
## Food.and.drink5                     4.147e+00  6.846e-01   6.057 1.39e-09 ***
## Gate.location1                     -1.971e+01  4.400e+03  -0.004  0.99643    
## Gate.location2                     -1.968e+01  4.400e+03  -0.004  0.99643    
## Gate.location3                     -2.001e+01  4.400e+03  -0.005  0.99637    
## Gate.location4                     -1.974e+01  4.400e+03  -0.004  0.99642    
## Gate.location5                     -1.980e+01  4.400e+03  -0.004  0.99641    
## Inflight.wifi.service1             -2.120e-01  1.303e+00  -0.163  0.87069    
## Inflight.wifi.service2              1.443e-01  1.302e+00   0.111  0.91177    
## Inflight.wifi.service3             -8.046e-02  1.302e+00  -0.062  0.95071    
## Inflight.wifi.service4             -1.047e-01  1.302e+00  -0.080  0.93589    
## Inflight.wifi.service5             -2.851e-01  1.302e+00  -0.219  0.82666    
## Inflight.entertainment1            -3.265e+00  6.913e-01  -4.722 2.33e-06 ***
## Inflight.entertainment2            -3.190e+00  6.907e-01  -4.618 3.87e-06 ***
## Inflight.entertainment3            -3.393e+00  6.903e-01  -4.915 8.87e-07 ***
## Inflight.entertainment4            -1.617e+00  6.897e-01  -2.345  0.01905 *  
## Inflight.entertainment5            -3.401e-01  6.898e-01  -0.493  0.62200    
## Online.support1                     2.017e+01  6.523e+03   0.003  0.99753    
## Online.support2                     1.977e+01  6.523e+03   0.003  0.99758    
## Online.support3                     1.874e+01  6.523e+03   0.003  0.99771    
## Online.support4                     1.952e+01  6.523e+03   0.003  0.99761    
## Online.support5                     2.017e+01  6.523e+03   0.003  0.99753    
## Ease.of.Online.booking1             3.875e+01  1.447e+03   0.027  0.97864    
## Ease.of.Online.booking2             3.959e+01  1.447e+03   0.027  0.97817    
## Ease.of.Online.booking3             4.061e+01  1.447e+03   0.028  0.97761    
## Ease.of.Online.booking4             4.072e+01  1.447e+03   0.028  0.97755    
## Ease.of.Online.booking5             3.993e+01  1.447e+03   0.028  0.97799    
## On.board.service1                  -2.308e+01  3.381e+03  -0.007  0.99455    
## On.board.service2                  -2.291e+01  3.381e+03  -0.007  0.99459    
## On.board.service3                  -2.246e+01  3.381e+03  -0.007  0.99470    
## On.board.service4                  -2.229e+01  3.381e+03  -0.007  0.99474    
## On.board.service5                  -2.182e+01  3.381e+03  -0.006  0.99485    
## Leg.room.service1                  -2.068e+00  7.470e-01  -2.769  0.00563 ** 
## Leg.room.service2                  -1.807e+00  7.467e-01  -2.420  0.01551 *  
## Leg.room.service3                  -1.982e+00  7.465e-01  -2.655  0.00794 ** 
## Leg.room.service4                  -1.297e+00  7.465e-01  -1.737  0.08237 .  
## Leg.room.service5                  -1.147e+00  7.466e-01  -1.537  0.12435    
## Baggage.handling2                  -9.120e-02  7.235e-02  -1.260  0.20749    
## Baggage.handling3                  -6.285e-01  6.750e-02  -9.310  < 2e-16 ***
## Baggage.handling4                  -8.843e-02  6.563e-02  -1.347  0.17784    
## Baggage.handling5                   4.511e-01  6.911e-02   6.527 6.69e-11 ***
## Checkin.service1                   -1.232e+00  4.996e-02 -24.662  < 2e-16 ***
## Checkin.service2                   -1.090e+00  4.954e-02 -21.996  < 2e-16 ***
## Checkin.service3                   -6.053e-01  3.910e-02 -15.483  < 2e-16 ***
## Checkin.service4                   -6.030e-01  3.877e-02 -15.553  < 2e-16 ***
## Checkin.service5                           NA         NA      NA       NA    
## Cleanliness1                       -3.248e-01  7.043e-02  -4.611 4.01e-06 ***
## Cleanliness2                       -4.044e-01  6.439e-02  -6.281 3.36e-10 ***
## Cleanliness3                       -1.178e+00  5.242e-02 -22.481  < 2e-16 ***
## Cleanliness4                       -5.832e-01  4.047e-02 -14.412  < 2e-16 ***
## Cleanliness5                               NA         NA      NA       NA    
## Online.boarding1                   -5.116e-01  6.922e-02  -7.392 1.45e-13 ***
## Online.boarding2                   -6.438e-01  6.690e-02  -9.623  < 2e-16 ***
## Online.boarding3                   -1.679e-01  5.572e-02  -3.014  0.00258 ** 
## Online.boarding4                   -3.611e-01  5.375e-02  -6.717 1.85e-11 ***
## Online.boarding5                           NA         NA      NA       NA    
## Departure.Delay.in.Minutes          1.351e-03  1.118e-03   1.208  0.22690    
## Arrival.Delay.in.Minutes           -6.296e-03  1.109e-03  -5.677 1.37e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 125226  on 90915  degrees of freedom
## Residual deviance:  41996  on 90840  degrees of freedom
## AIC: 42148
## 
## Number of Fisher Scoring iterations: 17

AIC untul all predictors = 42034

7. Logistic Regression dengan feature selections

# stepwise
model_kepuasan_step <- step(object=model_kepuasan_all, direction="backward", trace=F)
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(model_kepuasan_step)
## 
## Call:
## glm(formula = satisfaction ~ Gender + Customer.Type + Age + Type.of.Travel + 
##     Class + Flight.Distance + Seat.comfort + Departure.Arrival.time.convenient + 
##     Food.and.drink + Gate.location + Inflight.wifi.service + 
##     Inflight.entertainment + Online.support + Ease.of.Online.booking + 
##     On.board.service + Leg.room.service + Baggage.handling + 
##     Checkin.service + Cleanliness + Online.boarding + Arrival.Delay.in.Minutes, 
##     family = "binomial", data = kepuasan_train)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -4.4422  -0.2998   0.0127   0.2345   3.6801  
## 
## Coefficients: (3 not defined because of singularities)
##                                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                         6.708e+00  8.441e+03   0.001  0.99937    
## GenderMale                         -9.082e-01  2.625e-02 -34.602  < 2e-16 ***
## Customer.TypeLoyal Customer         2.783e+00  4.391e-02  63.376  < 2e-16 ***
## Age                                -4.251e-03  9.086e-04  -4.678 2.89e-06 ***
## Type.of.TravelPersonal Travel      -1.366e+00  3.908e-02 -34.959  < 2e-16 ***
## ClassEco                           -5.385e-01  3.408e-02 -15.802  < 2e-16 ***
## ClassEco Plus                      -6.907e-01  5.297e-02 -13.039  < 2e-16 ***
## Flight.Distance                    -8.640e-05  1.351e-05  -6.394 1.62e-10 ***
## Seat.comfort1                      -2.360e+01  8.819e+01  -0.268  0.78899    
## Seat.comfort2                      -2.399e+01  8.819e+01  -0.272  0.78561    
## Seat.comfort3                      -2.406e+01  8.819e+01  -0.273  0.78496    
## Seat.comfort4                      -2.297e+01  8.819e+01  -0.260  0.79451    
## Seat.comfort5                      -1.806e+01  8.819e+01  -0.205  0.83777    
## Departure.Arrival.time.convenient1  5.211e-01  8.649e-02   6.025 1.69e-09 ***
## Departure.Arrival.time.convenient2  6.251e-01  8.366e-02   7.472 7.91e-14 ***
## Departure.Arrival.time.convenient3  5.977e-01  8.171e-02   7.314 2.59e-13 ***
## Departure.Arrival.time.convenient4 -4.542e-01  7.611e-02  -5.968 2.40e-09 ***
## Departure.Arrival.time.convenient5 -1.572e+00  8.267e-02 -19.015  < 2e-16 ***
## Food.and.drink1                     3.452e+00  6.849e-01   5.040 4.65e-07 ***
## Food.and.drink2                     3.460e+00  6.849e-01   5.051 4.39e-07 ***
## Food.and.drink3                     3.864e+00  6.846e-01   5.645 1.66e-08 ***
## Food.and.drink4                     3.958e+00  6.842e-01   5.785 7.26e-09 ***
## Food.and.drink5                     4.148e+00  6.846e-01   6.059 1.37e-09 ***
## Gate.location1                     -1.971e+01  4.400e+03  -0.004  0.99643    
## Gate.location2                     -1.968e+01  4.400e+03  -0.004  0.99643    
## Gate.location3                     -2.001e+01  4.400e+03  -0.005  0.99637    
## Gate.location4                     -1.974e+01  4.400e+03  -0.004  0.99642    
## Gate.location5                     -1.980e+01  4.400e+03  -0.004  0.99641    
## Inflight.wifi.service1             -2.148e-01  1.300e+00  -0.165  0.86873    
## Inflight.wifi.service2              1.412e-01  1.299e+00   0.109  0.91345    
## Inflight.wifi.service3             -8.293e-02  1.299e+00  -0.064  0.94909    
## Inflight.wifi.service4             -1.067e-01  1.299e+00  -0.082  0.93452    
## Inflight.wifi.service5             -2.872e-01  1.299e+00  -0.221  0.82502    
## Inflight.entertainment1            -3.265e+00  6.913e-01  -4.722 2.33e-06 ***
## Inflight.entertainment2            -3.190e+00  6.907e-01  -4.618 3.87e-06 ***
## Inflight.entertainment3            -3.393e+00  6.903e-01  -4.915 8.87e-07 ***
## Inflight.entertainment4            -1.617e+00  6.897e-01  -2.344  0.01907 *  
## Inflight.entertainment5            -3.399e-01  6.898e-01  -0.493  0.62221    
## Online.support1                     2.020e+01  6.523e+03   0.003  0.99753    
## Online.support2                     1.980e+01  6.523e+03   0.003  0.99758    
## Online.support3                     1.877e+01  6.523e+03   0.003  0.99770    
## Online.support4                     1.955e+01  6.523e+03   0.003  0.99761    
## Online.support5                     2.020e+01  6.523e+03   0.003  0.99753    
## Ease.of.Online.booking1             3.876e+01  1.447e+03   0.027  0.97863    
## Ease.of.Online.booking2             3.960e+01  1.447e+03   0.027  0.97816    
## Ease.of.Online.booking3             4.062e+01  1.447e+03   0.028  0.97760    
## Ease.of.Online.booking4             4.072e+01  1.447e+03   0.028  0.97754    
## Ease.of.Online.booking5             3.994e+01  1.447e+03   0.028  0.97798    
## On.board.service1                  -2.309e+01  3.381e+03  -0.007  0.99455    
## On.board.service2                  -2.292e+01  3.381e+03  -0.007  0.99459    
## On.board.service3                  -2.247e+01  3.381e+03  -0.007  0.99470    
## On.board.service4                  -2.229e+01  3.381e+03  -0.007  0.99474    
## On.board.service5                  -2.182e+01  3.381e+03  -0.006  0.99485    
## Leg.room.service1                  -2.071e+00  7.469e-01  -2.773  0.00555 ** 
## Leg.room.service2                  -1.810e+00  7.467e-01  -2.425  0.01533 *  
## Leg.room.service3                  -1.985e+00  7.464e-01  -2.659  0.00783 ** 
## Leg.room.service4                  -1.300e+00  7.464e-01  -1.741  0.08163 .  
## Leg.room.service5                  -1.150e+00  7.466e-01  -1.541  0.12340    
## Baggage.handling2                  -9.124e-02  7.235e-02  -1.261  0.20727    
## Baggage.handling3                  -6.276e-01  6.749e-02  -9.299  < 2e-16 ***
## Baggage.handling4                  -8.763e-02  6.562e-02  -1.335  0.18175    
## Baggage.handling5                   4.518e-01  6.910e-02   6.537 6.26e-11 ***
## Checkin.service1                   -1.232e+00  4.996e-02 -24.659  < 2e-16 ***
## Checkin.service2                   -1.090e+00  4.954e-02 -22.005  < 2e-16 ***
## Checkin.service3                   -6.054e-01  3.910e-02 -15.484  < 2e-16 ***
## Checkin.service4                   -6.031e-01  3.877e-02 -15.556  < 2e-16 ***
## Checkin.service5                           NA         NA      NA       NA    
## Cleanliness1                       -3.246e-01  7.044e-02  -4.609 4.05e-06 ***
## Cleanliness2                       -4.041e-01  6.439e-02  -6.276 3.47e-10 ***
## Cleanliness3                       -1.179e+00  5.242e-02 -22.488  < 2e-16 ***
## Cleanliness4                       -5.832e-01  4.047e-02 -14.412  < 2e-16 ***
## Cleanliness5                               NA         NA      NA       NA    
## Online.boarding1                   -5.113e-01  6.922e-02  -7.386 1.51e-13 ***
## Online.boarding2                   -6.431e-01  6.690e-02  -9.613  < 2e-16 ***
## Online.boarding3                   -1.679e-01  5.572e-02  -3.013  0.00259 ** 
## Online.boarding4                   -3.610e-01  5.375e-02  -6.715 1.88e-11 ***
## Online.boarding5                           NA         NA      NA       NA    
## Arrival.Delay.in.Minutes           -5.018e-03  3.338e-04 -15.036  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 125226  on 90915  degrees of freedom
## Residual deviance:  41998  on 90841  degrees of freedom
## AIC: 42148
## 
## Number of Fisher Scoring iterations: 17

AIC untuk stepwise = AIC 42034, sama dengan pada model_kepuasaan_all

Prediksi menggunakan Logistic Regression

tambahkan kolom hasil prediksi pada data test

kepuasan_test$pred_value <- predict(model_kepuasan_all, kepuasan_test, type="response")
## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :
## prediction from a rank-deficient fit may be misleading
#kepuasan_test$pred_value

tambahkan label hasil prediksi pada data test

kepuasan_test$pred.Label <- ifelse(kepuasan_test$pred_value > 0.5, "1", "0")
#kepuasan_test %>% select(pred_value, pred.Label) 

ubah kelas label ke factor

kepuasan_test$pred.Label <- as.factor(kepuasan_test$pred.Label)

tambahkan label nama hasil prediksi pada data test

kepuasan_test$pred.Name <- ifelse(kepuasan_test$pred.Label == 1, "satisfied", "not satisfied")
kepuasan_test$pred.Name <- as.factor(kepuasan_test$pred.Name)
#kepuasan_test %>% select(pred_value, pred.Label, pred.Name)

quick view perbandingkan data asli (satisfaction) dan label prediksi

#kepuasan_test %>% select(satisfaction, pred.Name)
kepuasan_test[1:20 , c("satisfaction", "pred.Name" )]
kepuasan_test$satisfaction_label <- ifelse(kepuasan_test$satisfaction == "satisfied", "1", "0")
kepuasan_test$satisfaction_label <- as.factor(kepuasan_test$satisfaction_label)

quick view perbandingkan data asli (satisfaction_label) dan label prediksi

#kepuasan_test %>% select(satisfaction_label, pred.Label)
kepuasan_test[1:20 , c("satisfaction_label", "pred.Label")]

Model Evaluation Logistic Regression

# confusion matrix
#install.packages("caret")
library(caret)
## Warning: package 'caret' was built under R version 4.2.2
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.2.2
## Loading required package: lattice
confusionMatrix(data = kepuasan_test$pred.Label,
                reference = kepuasan_test$satisfaction_label,
                positive = "1")
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction     0     1
##          0 15894  1957
##          1  1730 19383
##                                           
##                Accuracy : 0.9054          
##                  95% CI : (0.9024, 0.9083)
##     No Information Rate : 0.5477          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.8092          
##                                           
##  Mcnemar's Test P-Value : 0.0001977       
##                                           
##             Sensitivity : 0.9083          
##             Specificity : 0.9018          
##          Pos Pred Value : 0.9181          
##          Neg Pred Value : 0.8904          
##              Prevalence : 0.5477          
##          Detection Rate : 0.4975          
##    Detection Prevalence : 0.5419          
##       Balanced Accuracy : 0.9051          
##                                           
##        'Positive' Class : 1               
## 

sebaran peluang prediksi data

ggplot(kepuasan_test, aes(x=pred.Label)) +
  geom_density(lwd=0.5) +
  labs(title = "Distribution of Probability Prediction Data") +
  theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.

check range hasil prediksi

boxplot(kepuasan_test$pred_value)

> berdasarkan hasil boxplot, median hasil prediksi adalah sekitar 0.7

PART B: K-NEAREST NEIGHBOR

1. Data preparation

LBB ini merupakan bagian dari pembelajaran machine learning. Data yang digunakan untuk LBB ini saya ambil dari website Kaggle.com, berjudul Airline survey. Tujuan LBB: untuk simulasi algoritma KNN* terhadap kepuasan pelanggan atas layanan airline.

Read data

library(dplyr)
 kepuasanKNN <- read.csv("Airline_survey.csv")
head(kepuasanKNN)
dim(kepuasanKNN)
## [1] 129880     23
str(kepuasanKNN)
## 'data.frame':    129880 obs. of  23 variables:
##  $ satisfaction                     : chr  "satisfied" "satisfied" "satisfied" "satisfied" ...
##  $ Gender                           : chr  "Female" "Male" "Female" "Female" ...
##  $ Customer.Type                    : chr  "Loyal Customer" "Loyal Customer" "Loyal Customer" "Loyal Customer" ...
##  $ Age                              : int  65 47 15 60 70 30 66 10 56 22 ...
##  $ Type.of.Travel                   : chr  "Personal Travel" "Personal Travel" "Personal Travel" "Personal Travel" ...
##  $ Class                            : chr  "Eco" "Business" "Eco" "Eco" ...
##  $ Flight.Distance                  : int  265 2464 2138 623 354 1894 227 1812 73 1556 ...
##  $ Seat.comfort                     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Departure.Arrival.time.convenient: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Food.and.drink                   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Gate.location                    : int  2 3 3 3 3 3 3 3 3 3 ...
##  $ Inflight.wifi.service            : int  2 0 2 3 4 2 2 2 5 2 ...
##  $ Inflight.entertainment           : int  4 2 0 4 3 0 5 0 3 0 ...
##  $ Online.support                   : int  2 2 2 3 4 2 5 2 5 2 ...
##  $ Ease.of.Online.booking           : int  3 3 2 1 2 2 5 2 4 2 ...
##  $ On.board.service                 : int  3 4 3 1 2 5 5 3 4 2 ...
##  $ Leg.room.service                 : int  0 4 3 0 0 4 0 3 0 4 ...
##  $ Baggage.handling                 : int  3 4 4 1 2 5 5 4 1 5 ...
##  $ Checkin.service                  : int  5 2 4 4 4 5 5 5 5 3 ...
##  $ Cleanliness                      : int  3 3 4 1 2 4 5 4 4 4 ...
##  $ Online.boarding                  : int  2 2 2 3 5 2 3 2 4 2 ...
##  $ Departure.Delay.in.Minutes       : int  0 310 0 0 0 0 17 0 0 30 ...
##  $ Arrival.Delay.in.Minutes         : int  0 305 0 0 0 0 15 0 0 26 ...

2. Data wrangling

names(kepuasanKNN)
##  [1] "satisfaction"                      "Gender"                           
##  [3] "Customer.Type"                     "Age"                              
##  [5] "Type.of.Travel"                    "Class"                            
##  [7] "Flight.Distance"                   "Seat.comfort"                     
##  [9] "Departure.Arrival.time.convenient" "Food.and.drink"                   
## [11] "Gate.location"                     "Inflight.wifi.service"            
## [13] "Inflight.entertainment"            "Online.support"                   
## [15] "Ease.of.Online.booking"            "On.board.service"                 
## [17] "Leg.room.service"                  "Baggage.handling"                 
## [19] "Checkin.service"                   "Cleanliness"                      
## [21] "Online.boarding"                   "Departure.Delay.in.Minutes"       
## [23] "Arrival.Delay.in.Minutes"

**ubah beberapa kolom ke factor*

kepuasanKNN <- kepuasanKNN %>% 
  mutate(satisfaction = as.factor(satisfaction),
         Gender = as.factor(Gender),
         Customer.Type = as.factor(Customer.Type),
         Type.of.Travel = as.factor(Type.of.Travel),
         Class = as.factor(Class),
         Seat.comfort = as.factor(Seat.comfort),
         Departure.Arrival.time.convenient = as.factor(Departure.Arrival.time.convenient),
         Food.and.drink  = as.factor(Food.and.drink),
         Gate.location = as.factor(Gate.location),
         Inflight.wifi.service = as.factor(Inflight.wifi.service),
         Inflight.entertainment  = as.factor(Inflight.entertainment),
         Online.support = as.factor(Online.support),
         Ease.of.Online.booking = as.factor(Ease.of.Online.booking),
         On.board.service = as.factor(On.board.service),
         Leg.room.service = as.factor(Leg.room.service),
         Baggage.handling = as.factor(Baggage.handling),
         Checkin.service  = as.factor(Checkin.service),
         Cleanliness = as.factor(Cleanliness),
         Online.boarding = as.factor(Online.boarding))

glimpse(kepuasanKNN)
## Rows: 129,880
## Columns: 23
## $ satisfaction                      <fct> satisfied, satisfied, satisfied, sat…
## $ Gender                            <fct> Female, Male, Female, Female, Female…
## $ Customer.Type                     <fct> Loyal Customer, Loyal Customer, Loya…
## $ Age                               <int> 65, 47, 15, 60, 70, 30, 66, 10, 56, …
## $ Type.of.Travel                    <fct> Personal Travel, Personal Travel, Pe…
## $ Class                             <fct> Eco, Business, Eco, Eco, Eco, Eco, E…
## $ Flight.Distance                   <int> 265, 2464, 2138, 623, 354, 1894, 227…
## $ Seat.comfort                      <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ Departure.Arrival.time.convenient <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ Food.and.drink                    <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ Gate.location                     <fct> 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, …
## $ Inflight.wifi.service             <fct> 2, 0, 2, 3, 4, 2, 2, 2, 5, 2, 3, 2, …
## $ Inflight.entertainment            <fct> 4, 2, 0, 4, 3, 0, 5, 0, 3, 0, 3, 0, …
## $ Online.support                    <fct> 2, 2, 2, 3, 4, 2, 5, 2, 5, 2, 3, 2, …
## $ Ease.of.Online.booking            <fct> 3, 3, 2, 1, 2, 2, 5, 2, 4, 2, 3, 2, …
## $ On.board.service                  <fct> 3, 4, 3, 1, 2, 5, 5, 3, 4, 2, 3, 3, …
## $ Leg.room.service                  <fct> 0, 4, 3, 0, 0, 4, 0, 3, 0, 4, 0, 2, …
## $ Baggage.handling                  <fct> 3, 4, 4, 1, 2, 5, 5, 4, 1, 5, 1, 5, …
## $ Checkin.service                   <fct> 5, 2, 4, 4, 4, 5, 5, 5, 5, 3, 2, 2, …
## $ Cleanliness                       <fct> 3, 3, 4, 1, 2, 4, 5, 4, 4, 4, 3, 5, …
## $ Online.boarding                   <fct> 2, 2, 2, 3, 5, 2, 3, 2, 4, 2, 5, 2, …
## $ Departure.Delay.in.Minutes        <int> 0, 310, 0, 0, 0, 0, 17, 0, 0, 30, 47…
## $ Arrival.Delay.in.Minutes          <int> 0, 305, 0, 0, 0, 0, 15, 0, 0, 26, 48…

3. Exploratory data analysis KNN

Check missing value

anyNA(kepuasanKNN)
## [1] TRUE
is.na(kepuasanKNN)%>% colSums()
##                      satisfaction                            Gender 
##                                 0                                 0 
##                     Customer.Type                               Age 
##                                 0                                 0 
##                    Type.of.Travel                             Class 
##                                 0                                 0 
##                   Flight.Distance                      Seat.comfort 
##                                 0                                 0 
## Departure.Arrival.time.convenient                    Food.and.drink 
##                                 0                                 0 
##                     Gate.location             Inflight.wifi.service 
##                                 0                                 0 
##            Inflight.entertainment                    Online.support 
##                                 0                                 0 
##            Ease.of.Online.booking                  On.board.service 
##                                 0                                 0 
##                  Leg.room.service                  Baggage.handling 
##                                 0                                 0 
##                   Checkin.service                       Cleanliness 
##                                 0                                 0 
##                   Online.boarding        Departure.Delay.in.Minutes 
##                                 0                                 0 
##          Arrival.Delay.in.Minutes 
##                               393

4. Data pre-processing KNN

isi missing value dengan mean

kepuasanKNN$Arrival.Delay.in.Minutes[is.na(kepuasanKNN$Arrival.Delay.in.Minutes)]=
  mean(kepuasanKNN$Arrival.Delay.in.Minutes, na.rm = TRUE)

pastikan sudah tidak ada missing value

anyNA(kepuasanKNN)
## [1] FALSE
is.na(kepuasanKNN)%>% colSums()
##                      satisfaction                            Gender 
##                                 0                                 0 
##                     Customer.Type                               Age 
##                                 0                                 0 
##                    Type.of.Travel                             Class 
##                                 0                                 0 
##                   Flight.Distance                      Seat.comfort 
##                                 0                                 0 
## Departure.Arrival.time.convenient                    Food.and.drink 
##                                 0                                 0 
##                     Gate.location             Inflight.wifi.service 
##                                 0                                 0 
##            Inflight.entertainment                    Online.support 
##                                 0                                 0 
##            Ease.of.Online.booking                  On.board.service 
##                                 0                                 0 
##                  Leg.room.service                  Baggage.handling 
##                                 0                                 0 
##                   Checkin.service                       Cleanliness 
##                                 0                                 0 
##                   Online.boarding        Departure.Delay.in.Minutes 
##                                 0                                 0 
##          Arrival.Delay.in.Minutes 
##                                 0

Check class imbalance

table(kepuasanKNN$satisfaction)
## 
## dissatisfied    satisfied 
##        58793        71087
prop.table(table(kepuasanKNN$satisfaction))
## 
## dissatisfied    satisfied 
##    0.4526717    0.5473283

5. Cross Validation KNN

RNGkind(sample.kind = "Rounding")
## Warning in RNGkind(sample.kind = "Rounding"): non-uniform 'Rounding' sampler
## used
set.seed(100)
# your code here
index <- sample(nrow(kepuasanKNN), nrow(kepuasanKNN)*0.7)
kepuasanKNN_train <- kepuasanKNN[index,]
kepuasanKNN_test <- kepuasanKNN[-index,]

Pisahkan prediktor dan target pada data TRAIN maupun data TEST

library(dplyr)
#pemisahan prediktor
kepuasanKNN_train_x <- kepuasanKNN_train %>% select_if(is.numeric)
kepuasanKNN_test_x <- kepuasanKNN_test %>% select_if(is.numeric)

#pemisahan kolom target
kepuasanKNN_train_y <- kepuasanKNN_train[,"satisfaction"]
kepuasanKNN_test_y <- kepuasanKNN_test[,"satisfaction"]

6. Scaling KNN predictor

library(gtools)
## Warning: package 'gtools' was built under R version 4.2.2
# scale train_x data
kepuasanKNN_train_xs  <- scale(kepuasanKNN_train_x)

# scale test_x data
kepuasanKNN_test_xs <- scale(kepuasanKNN_test_x,
                      center = attr(kepuasanKNN_train_xs, "scaled:center"),
                      scale = attr(kepuasanKNN_train_xs, "scaled:scale"))
#mencari nilai k
sqrt(nrow(kepuasanKNN_train_xs))
## [1] 301.5228

7. Modelling KNN

library(class)
model_knn <- knn(train=kepuasanKNN_train_xs, 
                 test=kepuasanKNN_test_xs, 
                cl=kepuasanKNN_train_y,
                k=301)

8. Quick check hasil prediksi KNN

#model_knn
model_knn[1:10] 
##  [1] satisfied    dissatisfied satisfied    dissatisfied satisfied   
##  [6] satisfied    dissatisfied satisfied    satisfied    satisfied   
## Levels: dissatisfied satisfied

9. Model evaluation KNN

confusion matrix KNN

library(caret)
confusionMatrix(data = as.factor(model_knn),
                reference = kepuasanKNN_test_y,
                positive = "satisfied")
## Confusion Matrix and Statistics
## 
##               Reference
## Prediction     dissatisfied satisfied
##   dissatisfied         9849      6756
##   satisfied            7775     14584
##                                           
##                Accuracy : 0.6271          
##                  95% CI : (0.6222, 0.6319)
##     No Information Rate : 0.5477          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.2435          
##                                           
##  Mcnemar's Test P-Value : < 2.2e-16       
##                                           
##             Sensitivity : 0.6834          
##             Specificity : 0.5588          
##          Pos Pred Value : 0.6523          
##          Neg Pred Value : 0.5931          
##              Prevalence : 0.5477          
##          Detection Rate : 0.3743          
##    Detection Prevalence : 0.5738          
##       Balanced Accuracy : 0.6211          
##                                           
##        'Positive' Class : satisfied       
## 

PART C: ANALISA HASIL LOGISTIC REGRESSION vs KNN

Analisa Data

  • Analisa awal dapat dilakukan dengan membandingkan nilai AIC. Untuk Logistic Regression prediction menggunakan all predictor dan yang menggunakan feature selection menghasilkan AIC yang sama yaitu AIC 42034

  • Data yang kita analisa memiliki 393 missing value sehingga menimbulkan error pada saat dilakukan KNN. Oleh karena itu perlu dilakukan pengisian missing value, dalam hal ini menggunakan mean.

Analisa hasil prediksi

  • Evaluasi prediksi dapat dilakukan dengan menggunakan confusion matrix

  • Kebutuhan bisnis untuk evaluasi survey kepuasan pelanggan airline, menurut saya yang diharapkan adalah:

sekecil mungkin False Positive (FP), karena manajemen tentunya tidak ingin terkecoh oleh data yang menyatakan seolah pelanggan puas (satisfied), padahal tidak puas (disatisfied)

oleh karena itu matrix yang paling tepat digunakan adalah Precision

  • Bandingkan data precision/Pos Pred Value hasil Logistic Regression dan KNN
  • Pos Pred Value Logistic Regression : 0.9181
  • Pos Pred Value KNN : 0.6523

dengan demikian sebaiknya model yang dipilih adalah Logistic Regression, dan bukan KNN

  • Rendahnya nilai precision KNN dapat dimaklumi, karena salah satu kelemahan KNN adalah KNN baik untuk prediktor numerik (karena mengklasifikasikan berdasarkan jarak), tidak baik untuk prediktor kategorik, sedangkan data-data yang kita miliki sebagian besar menggunakan prediktor kategorik, sehingga lebih tepat menggunakan Logistic Regression

END