Titanic Classification
Data Preparation
Load package
library(dplyr)##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(gtools)## Warning: package 'gtools' was built under R version 4.2.2
library(ggplot2)## Warning: package 'ggplot2' was built under R version 4.2.2
library(caret)## Warning: package 'caret' was built under R version 4.2.2
## Loading required package: lattice
library(class)
library(MASS)##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
Load Dataset
Membuka data sekaligus melihat struktur data
titanic <- read.csv("train.csv")
str(titanic)## 'data.frame': 891 obs. of 12 variables:
## $ PassengerId: int 1 2 3 4 5 6 7 8 9 10 ...
## $ Survived : int 0 1 1 1 0 0 0 0 1 1 ...
## $ Pclass : int 3 1 3 1 3 3 1 3 3 2 ...
## $ Name : chr "Braund, Mr. Owen Harris" "Cumings, Mrs. John Bradley (Florence Briggs Thayer)" "Heikkinen, Miss. Laina" "Futrelle, Mrs. Jacques Heath (Lily May Peel)" ...
## $ Sex : chr "male" "female" "female" "female" ...
## $ Age : num 22 38 26 35 35 NA 54 2 27 14 ...
## $ SibSp : int 1 1 0 1 0 0 0 3 0 1 ...
## $ Parch : int 0 0 0 0 0 0 0 1 2 0 ...
## $ Ticket : chr "A/5 21171" "PC 17599" "STON/O2. 3101282" "113803" ...
## $ Fare : num 7.25 71.28 7.92 53.1 8.05 ...
## $ Cabin : chr "" "C85" "" "C123" ...
## $ Embarked : chr "S" "C" "S" "S" ...
Keterangan Data: - PassangerId = Nomor Identitas Penumpang - Survived
= Survival 0 = No, 1 = Yes - Pclass = Tipe Kelas Tiket 1 = 1st, 2 = 2nd,
3 = 3rd - Name = Nama Lengkap Penumpang - Sex = Jenis Kelamin - Age =
Usia Dalam Tahun - SibSp = Jumlah Saudara Yang Naik Ke Titanic - Parch =
Jumlah Orang Tua / Anak Yang Naik Ke Titanic
- Ticket = Nomor Tiket - Fare = Tarif Penumpang - Cabin = Nomor Kabin -
Embarked = Port of Embarkation C = Cherbourg, Q = Queenstown, S =
Southampton
Data Wrangling
Mengubah kolom menjadi faktor
titanic <- titanic %>%
mutate(Survived = as.factor(Survived),
Pclass = as.factor(Pclass),
Sex = as.factor(Sex),
SibSp = as.factor(SibSp),
Parch = as.factor(Parch),
Embarked = as.factor(Embarked))
glimpse(titanic)## Rows: 891
## Columns: 12
## $ PassengerId <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…
## $ Survived <fct> 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1…
## $ Pclass <fct> 3, 1, 3, 1, 3, 3, 1, 3, 3, 2, 3, 1, 3, 3, 3, 2, 3, 2, 3, 3…
## $ Name <chr> "Braund, Mr. Owen Harris", "Cumings, Mrs. John Bradley (Fl…
## $ Sex <fct> male, female, female, female, male, male, male, male, fema…
## $ Age <dbl> 22, 38, 26, 35, 35, NA, 54, 2, 27, 14, 4, 58, 20, 39, 14, …
## $ SibSp <fct> 1, 1, 0, 1, 0, 0, 0, 3, 0, 1, 1, 0, 0, 1, 0, 0, 4, 0, 1, 0…
## $ Parch <fct> 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 1, 0, 0, 5, 0, 0, 1, 0, 0, 0…
## $ Ticket <chr> "A/5 21171", "PC 17599", "STON/O2. 3101282", "113803", "37…
## $ Fare <dbl> 7.2500, 71.2833, 7.9250, 53.1000, 8.0500, 8.4583, 51.8625,…
## $ Cabin <chr> "", "C85", "", "C123", "", "", "E46", "", "", "", "G6", "C…
## $ Embarked <fct> S, C, S, S, S, Q, S, S, S, C, S, S, S, S, S, S, Q, S, S, C…
Mengecek apakah terdapat missing value
anyNA(titanic)## [1] TRUE
Dihasilkan “TRUE” Menyatakan bahwa adanya “missing value” pada dataset
Mengecek secara lengkap data yang hilang
colSums(is.na(titanic))## PassengerId Survived Pclass Name Sex Age
## 0 0 0 0 0 177
## SibSp Parch Ticket Fare Cabin Embarked
## 0 0 0 0 0 0
Karena kolom Age memiliki banyak missing value, kita isi
dengan nilai “mean”
titanic$Age[is.na(titanic$Age)] <- mean(titanic$Age, na.rm = T)
colSums(is.na(titanic))## PassengerId Survived Pclass Name Sex Age
## 0 0 0 0 0 0
## SibSp Parch Ticket Fare Cabin Embarked
## 0 0 0 0 0 0
Sebelum membagi data ke “data_train” dan “data_test”, kita harus mengecek proporsi data variabel target
prop.table(table(titanic$Survived))##
## 0 1
## 0.6161616 0.3838384
Jika dilihat dari proporsi kedua kelas, data sudah cukup seimbang sehingga kita tidak membutuhkan pre-processing data tambahan
Splitting Data Train dan Data Test
RNGkind(sample.kind = "Rounding")## Warning in RNGkind(sample.kind = "Rounding"): non-uniform 'Rounding' sampler
## used
set.seed(100)
index <- sample(nrow(titanic), nrow(titanic)*0.7)
titanic_train <- titanic[index,]
titanic_test <- titanic[-index,]Modelling
model1 <- glm(formula = Survived~Age+Sex+Pclass+Cabin+Embarked, family = "binomial", data = titanic_train)
summary(model1)##
## Call:
## glm(formula = Survived ~ Age + Sex + Pclass + Cabin + Embarked,
## family = "binomial", data = titanic_train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7493 -0.5801 -0.3941 0.5423 2.4457
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.67751 0.62935 4.254 2.1e-05 ***
## Age -0.03846 0.01072 -3.586 0.000336 ***
## Sexmale -2.26008 0.23544 -9.600 < 2e-16 ***
## Pclass2 -0.61454 0.50687 -1.212 0.225357
## Pclass3 -1.62616 0.48796 -3.333 0.000860 ***
## CabinA10 -18.11660 6522.63862 -0.003 0.997784
## CabinA16 17.21695 6522.63862 0.003 0.997894
## CabinA19 -17.84135 6522.63862 -0.003 0.997818
## CabinA20 19.51549 6522.63861 0.003 0.997613
## CabinA23 21.22523 6522.63864 0.003 0.997404
## CabinA24 -17.79132 6522.63862 -0.003 0.997824
## CabinA31 19.16937 6522.63863 0.003 0.997655
## CabinA32 -17.84135 6522.63862 -0.003 0.997818
## CabinA34 18.30246 6522.63862 0.003 0.997761
## CabinA6 19.22544 6522.63863 0.003 0.997648
## CabinB101 18.97708 6522.63862 0.003 0.997679
## CabinB102 -17.84135 6522.63862 -0.003 0.997818
## CabinB19 -16.63760 6522.63862 -0.003 0.997965
## CabinB20 18.31233 3786.64085 0.005 0.996141
## CabinB22 17.27302 6522.63863 0.003 0.997887
## CabinB28 18.27292 6522.63863 0.003 0.997765
## CabinB3 17.54222 6522.63861 0.003 0.997854
## CabinB30 -17.00133 6522.63863 -0.003 0.997920
## CabinB35 16.29397 6522.63862 0.002 0.998007
## CabinB38 -17.25292 6522.63862 -0.003 0.997890
## CabinB42 16.61925 6522.63862 0.003 0.997967
## CabinB49 16.10169 6522.63862 0.002 0.998030
## CabinB5 16.75538 4577.63558 0.004 0.997080
## CabinB50 18.86171 6522.63861 0.003 0.997693
## CabinB51 B53 B55 0.65057 1.48765 0.437 0.661885
## CabinB58 B60 -0.64211 1.55908 -0.412 0.680448
## CabinB73 17.04228 6522.63861 0.003 0.997915
## CabinB77 17.15765 6522.63863 0.003 0.997901
## CabinB78 16.51315 6522.63861 0.003 0.997980
## CabinB80 17.60153 6522.63862 0.003 0.997847
## CabinB82 B84 -17.73202 6522.63862 -0.003 0.997831
## CabinB94 -17.44521 6522.63862 -0.003 0.997866
## CabinB96 B98 18.25259 2778.50026 0.007 0.994759
## CabinC101 17.92680 6522.63863 0.003 0.997807
## CabinC103 18.11909 6522.63863 0.003 0.997784
## CabinC104 20.14842 6522.63863 0.003 0.997535
## CabinC106 19.29078 6522.63861 0.003 0.997640
## CabinC123 -17.56058 6522.63862 -0.003 0.997852
## CabinC124 -17.84135 6522.63862 -0.003 0.997818
## CabinC125 18.11909 6522.63862 0.003 0.997784
## CabinC126 17.03071 6522.63862 0.003 0.997917
## CabinC128 -17.84135 6522.63862 -0.003 0.997818
## CabinC148 18.63097 6522.63863 0.003 0.997721
## CabinC22 C26 -2.40471 1.43129 -1.680 0.092938 .
## CabinC23 C25 C27 -0.40764 1.35773 -0.300 0.763999
## CabinC45 16.83238 6522.63863 0.003 0.997941
## CabinC47 18.77323 6522.63861 0.003 0.997704
## CabinC52 19.22544 6522.63863 0.003 0.997648
## CabinC65 -1.39203 1.73175 -0.804 0.421496
## CabinC68 -17.61665 6522.63862 -0.003 0.997845
## CabinC7 17.08073 6522.63861 0.003 0.997911
## CabinC70 18.28485 6522.63863 0.003 0.997763
## CabinC78 -0.99371 1.83806 -0.541 0.588761
## CabinC82 -18.46271 6522.63862 -0.003 0.997742
## CabinC83 17.23456 6522.63862 0.003 0.997892
## CabinC85 16.83238 6522.63861 0.003 0.997941
## CabinC86 -17.57819 6522.63862 -0.003 0.997850
## CabinC90 16.29397 6522.63863 0.002 0.998007
## CabinC91 -17.52212 6522.63862 -0.003 0.997857
## CabinC92 18.44242 3722.90778 0.005 0.996047
## CabinC95 -17.84135 6522.63862 -0.003 0.997818
## CabinD 17.44906 4489.48234 0.004 0.996899
## CabinD10 D12 18.51560 6522.63863 0.003 0.997735
## CabinD11 17.84988 6522.63863 0.003 0.997817
## CabinD15 16.60163 6522.63862 0.003 0.997969
## CabinD17 17.75385 4612.02267 0.004 0.996929
## CabinD19 19.76384 6522.63861 0.003 0.997582
## CabinD21 17.03071 6522.63863 0.003 0.997917
## CabinD26 -17.65146 4433.57119 -0.004 0.996823
## CabinD28 16.50387 6522.63862 0.003 0.997981
## CabinD30 -18.25281 6522.63862 -0.003 0.997767
## CabinD35 19.57156 6522.63863 0.003 0.997606
## CabinD36 16.25552 6522.63863 0.002 0.998012
## CabinD37 17.67844 6522.63862 0.003 0.997837
## CabinD46 -17.17601 6522.63862 -0.003 0.997899
## CabinD47 16.61925 6522.63862 0.003 0.997967
## CabinD48 -17.27053 6522.63862 -0.003 0.997887
## CabinD56 20.07072 6522.63861 0.003 0.997545
## CabinD6 -17.86824 6522.63862 -0.003 0.997814
## CabinD7 18.31137 6522.63862 0.003 0.997760
## CabinE101 17.65043 4606.79057 0.004 0.996943
## CabinE12 19.99459 6522.63861 0.003 0.997554
## CabinE121 20.00553 6522.63862 0.003 0.997553
## CabinE17 20.10996 6522.63863 0.003 0.997540
## CabinE24 19.63448 4603.45063 0.004 0.996597
## CabinE25 19.53310 4612.20202 0.004 0.996621
## CabinE31 -17.21446 6522.63862 -0.003 0.997894
## CabinE33 16.88899 4601.62468 0.004 0.997072
## CabinE34 16.90929 6522.63863 0.003 0.997932
## CabinE36 16.52472 6522.63863 0.003 0.997979
## CabinE40 16.94775 6522.63863 0.003 0.997927
## CabinE44 17.38839 6522.63862 0.003 0.997873
## CabinE58 -17.17601 6522.63862 -0.003 0.997899
## CabinE67 0.20234 1.82460 0.111 0.911700
## CabinE68 16.58079 6522.63862 0.003 0.997972
## CabinE77 -18.43697 6522.63861 -0.003 0.997745
## CabinF E69 18.13930 6522.63860 0.003 0.997781
## CabinF G63 -15.74214 6522.63861 -0.002 0.998074
## CabinF G73 -16.62666 6522.63861 -0.003 0.997966
## CabinF2 0.95663 1.51230 0.633 0.527015
## CabinF33 17.52488 4607.72770 0.004 0.996965
## CabinF38 -17.14205 6522.63861 -0.003 0.997903
## CabinF4 17.93506 4001.34204 0.004 0.996424
## CabinG6 -0.48454 1.05501 -0.459 0.646034
## EmbarkedC 0.51756 0.32392 1.598 0.110092
## EmbarkedQ 0.92685 0.38795 2.389 0.016890 *
## EmbarkedS NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 834.17 on 622 degrees of freedom
## Residual deviance: 476.18 on 512 degrees of freedom
## AIC: 698.18
##
## Number of Fisher Scoring iterations: 17
Meskipun banyak variabel prediktor yang signifikan terhadap variabel target, kita akan coba melakukan model fitting menggunakan metode stepwise.
model2 <- stepAIC(model1, direction = "backward")## Start: AIC=698.18
## Survived ~ Age + Sex + Pclass + Cabin + Embarked
##
## Df Deviance AIC
## - Cabin 103 584.64 600.64
## <none> 476.18 698.18
## - Embarked 2 483.29 701.29
## - Age 1 489.98 709.98
## - Pclass 2 496.19 714.19
## - Sex 1 578.89 798.89
##
## Step: AIC=600.64
## Survived ~ Age + Sex + Pclass + Embarked
##
## Df Deviance AIC
## <none> 584.64 600.64
## - Embarked 3 591.61 601.61
## - Age 1 599.80 613.80
## - Pclass 2 652.54 664.54
## - Sex 1 722.70 736.70
Dengan menggunakan metode backward pada stepwise, kita memperoleh model sebagai berikut.
summary(model2)##
## Call:
## glm(formula = Survived ~ Age + Sex + Pclass + Embarked, family = "binomial",
## data = titanic_train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.5201 -0.6694 -0.4310 0.7097 2.4059
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 15.622859 535.411444 0.029 0.976722
## Age -0.033174 0.008758 -3.788 0.000152 ***
## Sexmale -2.347520 0.215845 -10.876 < 2e-16 ***
## Pclass2 -1.105401 0.311185 -3.552 0.000382 ***
## Pclass3 -2.196170 0.286563 -7.664 1.8e-14 ***
## EmbarkedC -11.935408 535.411312 -0.022 0.982215
## EmbarkedQ -11.645347 535.411380 -0.022 0.982647
## EmbarkedS -12.423658 535.411285 -0.023 0.981488
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 834.17 on 622 degrees of freedom
## Residual deviance: 584.64 on 615 degrees of freedom
## AIC: 600.64
##
## Number of Fisher Scoring iterations: 12
Prediksi
Dengan menggunakan model2, kita akan prediksi menggunakan data test
titanic_test$prob_surv <- predict(model2, type = "response", newdata = titanic_test)Melihat sebaran peluang prediksi data.
ggplot(titanic_test, aes(x=prob_surv)) +
geom_density(lwd=1) +
labs(title = "Distribusi Peluang Prediksi Data") +
theme_minimal()## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
Pada grafik di atas, hasil prediksi lebih mengarah ke 0. Selanjutnya kita akan cek hasil prediksi dengan data aktual.
titanic_test$pred_surv <- factor(ifelse(titanic_test$prob_surv > 0.5, yes = 1, no = 0))
titanic_test[1:10, c("pred_surv", "Survived")]## pred_surv Survived
## 3 1 1
## 4 1 1
## 5 0 0
## 6 0 0
## 7 0 0
## 8 0 0
## 16 1 1
## 17 0 0
## 19 0 0
## 21 0 0
Model Evaluation
Melakukan evaluasi model dengan confusionMatrix
conf <- confusionMatrix(titanic_test$pred_surv, titanic_test$Survived, positive = "1")
conf## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 146 28
## 1 24 70
##
## Accuracy : 0.806
## 95% CI : (0.7535, 0.8516)
## No Information Rate : 0.6343
## P-Value [Acc > NIR] : 7.312e-10
##
## Kappa : 0.5781
##
## Mcnemar's Test P-Value : 0.6774
##
## Sensitivity : 0.7143
## Specificity : 0.8588
## Pos Pred Value : 0.7447
## Neg Pred Value : 0.8391
## Prevalence : 0.3657
## Detection Rate : 0.2612
## Detection Prevalence : 0.3507
## Balanced Accuracy : 0.7866
##
## 'Positive' Class : 1
##
- Re-call/Sensitivity = dari semua data aktual yang positif, seberapa mampu proporsi model saya menebak benar.
- Specificity = dari semua data aktual yang negatif, seberapa mampu proporsi model saya menebak yang benar.
- Accuracy = seberapa mampu model saya menebak dengan benar target Y.
- Precision = dari semua hasil prediksi, seberapa mampu model saya dapat menebak benar kelas positif.
Berdasarkan hasil confusionMatrix diatas, dapat kita ambil informasi bahwa kemampuan model dalam menebak target Y ( Survived dan Not Not Survived) sebesar 80,6%. Sedangkan dari keluruhan data aktual orang yang Survived, model dapat mampu menebak benar sebesar 85,8%. Dari keseluruhan data aktual orang yang Survived, model mampu menebak dengan benar sebesar 71,4%. Dari keseluruhan hasil prediksi yang mampu ditebak oleh model, model mampu menebak benar kelas positif sebesar 74,4%.
K-Nearest Neighbour
Pre-Processing Data
titanic_knn <- read.csv("train.csv")
titanic_knn <- titanic_knn[, -1]
titanic_knn$Survived <- as.factor(titanic_knn$Survived)
titanic_knn$Age[is.na(titanic_knn$Age)] <- mean(titanic_knn$Age, na.rm = T)
head(titanic_knn)## Survived Pclass Name Sex
## 1 0 3 Braund, Mr. Owen Harris male
## 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer) female
## 3 1 3 Heikkinen, Miss. Laina female
## 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female
## 5 0 3 Allen, Mr. William Henry male
## 6 0 3 Moran, Mr. James male
## Age SibSp Parch Ticket Fare Cabin Embarked
## 1 22.00000 1 0 A/5 21171 7.2500 S
## 2 38.00000 1 0 PC 17599 71.2833 C85 C
## 3 26.00000 0 0 STON/O2. 3101282 7.9250 S
## 4 35.00000 1 0 113803 53.1000 C123 S
## 5 35.00000 0 0 373450 8.0500 S
## 6 29.69912 0 0 330877 8.4583 Q
RNGkind(sample.kind = "Rounding")## Warning in RNGkind(sample.kind = "Rounding"): non-uniform 'Rounding' sampler
## used
set.seed(100)
index <- sample(nrow(titanic_knn), nrow(titanic_knn)*0.7)
knn_train <- titanic_knn[index,]
knn_test <- titanic_knn[-index,]prop.table(table(knn_train$Survived))##
## 0 1
## 0.6083467 0.3916533
Untuk proses k-NN, dipisahkan antara prediktor dan label (target variabelnya).
library(dplyr)
knn_train_real <- knn_train %>%
dplyr::select(c(Survived,Pclass,Age,SibSp,Parch,Fare))
knn_test_real <- knn_test %>%
dplyr::select(c(Survived,Pclass,Age,SibSp,Parch,Fare))train_x <- knn_train_real[,-1]
test_x <- knn_test_real[,-1]
train_y <- knn_train_real$Survived
test_y <- knn_test_real$SurvivedData prediktor akan discaling menggunakan z-score standarization. Data test juga harus discaling menggunakan parameter dari data train (karena menganggap data test adalah unseen data).
train_xs <- scale(train_x)
test_xs <- scale(test_x,
center = attr(train_xs,"scaled:center") ,
scale = attr(train_xs,"scaled:scale"))Mencari nilai optimum K
sqrt(nrow(train_xs))## [1] 24.95997
k-NN tidak membuat model sehingga kita bisa langsung proses predict.
pred_knn <- knn(train = train_xs,
test = test_xs,
cl = train_y,
k = 23)Cek hasil prediksi
pred_knn## [1] 0 1 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0
## [38] 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 1 1 0 1 0 1 0 0 0 1 0 0 1
## [75] 0 0 0 0 1 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 1 0 1 1 1 0 0 0
## [112] 0 0 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 1 1 0 1 0 0 0 1 1
## [149] 1 1 1 0 1 0 1 0 0 1 1 1 0 1 1 1 0 0 1 0 0 0 0 0 1 1 1 0 0 0 0 0 1 0 0 0 1
## [186] 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 1 1 0 1 0 0 1 0 0 0 0 0 1 0 1 0 1 1 1 0 0 0
## [223] 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 1 0 1 0 0 0 1 0 1 0 0
## [260] 0 0 0 1 0 0 0 0 0
## Levels: 0 1
Evaluasi Model
confusionMatrix(data = pred_knn, reference = test_y, positive = "1")## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 136 42
## 1 34 56
##
## Accuracy : 0.7164
## 95% CI : (0.6584, 0.7696)
## No Information Rate : 0.6343
## P-Value [Acc > NIR] : 0.002792
##
## Kappa : 0.378
##
## Mcnemar's Test P-Value : 0.422001
##
## Sensitivity : 0.5714
## Specificity : 0.8000
## Pos Pred Value : 0.6222
## Neg Pred Value : 0.7640
## Prevalence : 0.3657
## Detection Rate : 0.2090
## Detection Prevalence : 0.3358
## Balanced Accuracy : 0.6857
##
## 'Positive' Class : 1
##
Berdasarkan hasil confusionMatrix di atas, kemampuan model dalam menebak target Y ( Survived dan Not Not Survived) sebesar 71,6%. Sedangkan dari keluruhan data aktual orang yang Survived, model dapat mampu menebak benar sebesar 80%. Dari keseluruhan data aktual orang yang Survived, model mampu menebak dengan benar sebesar 57,1%. Dari keseluruhan hasil prediksi yang mampu ditebak oleh model, model mampu menebak benar kelas positif sebesar 62,2%.
Conclusion
Dari model di atas, saya akan memberikan perhatian lebih ke metric sensitivity karena saya tidak ingin model yang saya buat salah dalam memprediksi penumpang yang selamat tapi diprediksi tidak selamat.