This Project want to analysis and estimate what factors that influence survival rate passengers in Titanic. This analysis will use three models such as,
1. Logistic Regression
2. Decision Tree
3. Support Vector Machine (SVM)
These three models will be compared which can best explain survival rate passengers in Titanic based by their accuracy level.
Because the data has been separated from the beginning, I will combine the data again for cleansing data.
># Observations: 1,309
># Variables: 12
># $ PassengerId <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, ...
># $ Survived <int> 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0...
># $ Pclass <int> 3, 1, 3, 1, 3, 3, 1, 3, 3, 2, 3, 1, 3, 3, 3, 2, 3, 2, 3...
># $ Name <chr> "Braund, Mr. Owen Harris", "Cumings, Mrs. John Bradley ...
># $ Sex <fct> male, female, female, female, male, male, male, male, f...
># $ Age <dbl> 22, 38, 26, 35, 35, NA, 54, 2, 27, 14, 4, 58, 20, 39, 1...
># $ SibSp <int> 1, 1, 0, 1, 0, 0, 0, 3, 0, 1, 1, 0, 0, 1, 0, 0, 4, 0, 1...
># $ Parch <int> 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 1, 0, 0, 5, 0, 0, 1, 0, 0...
># $ Ticket <chr> "A/5 21171", "PC 17599", "STON/O2. 3101282", "113803", ...
># $ Fare <dbl> 7.2500, 71.2833, 7.9250, 53.1000, 8.0500, 8.4583, 51.86...
># $ Cabin <chr> NA, "C85", NA, "C123", NA, NA, "E46", NA, NA, NA, "G6",...
># $ Embarked <fct> S, C, S, S, S, Q, S, S, S, C, S, S, S, S, S, S, Q, S, S...
Information Data :
- Survival (0 = No, 1 = Yes)
- pclass = Ticket class (1 = 1st, 2 = 2nd, 3 = 3rd)
- Sex (Male, Female)
- Age (in years)
- sibsp = Number of siblings / spouses aboard the Titanic
- parch = Number of parents / children aboard the Titanic
- ticket = Ticket number
- fare = Passenger fare
- cabin = Cabin number
- embarked = Port of Embarkation (C = Cherbourg, Q = Queenstown, S = Southampton)
First, we will check if there are any missing values for each variables,
># PassengerId Survived Pclass Name Sex Age
># 0 418 0 0 0 263
># SibSp Parch Ticket Fare Cabin Embarked
># 0 0 0 1 1014 2
From the result, can be seen that there are several variables have missing values such as : Survived (418 NA), Age (263 NA), Fare (1 NA), Cabin (1014 NA), Embarked (2 NA). I will solve this problem one by one.
Begin with Age variable, I will replace missing Age cells with the mean age of all passengers on the titanic and divides by age (“0-12”, “13-17”, “18-59”, “>60” years) category to simplify analysis
full_data <- full_data %>%
mutate(
Age = ifelse(is.na(Age), mean(full_data$Age, na.rm = T), Age),
`Age Group` = case_when(Age < 13 ~ "00-12",
Age >= 13 & Age < 18 ~ "13-17",
Age >= 18 & Age < 60 ~ "18-59",
Age >= 60 ~ ">60"))
table(full_data$`Age Group`)>#
># >60 00-12 13-17 18-59
># 40 94 60 1115
in Embarked variable, I will replace Embarked missing values by most frequent observation such as Southampton (S)
>#
># C Q S
># 270 123 914
full_data$Embarked <- replace(full_data$Embarked, which(is.na(full_data$Embarked)), 'S')
table(full_data$Embarked)>#
># C Q S
># 270 123 916
In Title variable, that variable contains the name and the title used for each passengers, I will subset to take only the title used for each passengers
>#
># Capt Col Don Dona Dr Jonkheer
># 1 4 1 1 8 1
># Lady Major Master Miss Mlle Mme
># 1 2 61 260 2 1
># Mr Mrs Ms Rev Sir the Countess
># 757 197 2 8 1 1
Because too many titles, I will make some of the titles to new category so we have only five titles such as Master, Miss, Mr, Mrs, and Rare Title
full_data$Title[full_data$Title %in% c("Mlle", "Ms")] <- "Miss"
full_data$Title[full_data$Title == "Mme"] <- "Mrs"
full_data$Title[!(full_data$Title %in% c('Master', 'Miss', 'Mr', 'Mrs'))] <- "Rare Title"
table(full_data$Title)>#
># Master Miss Mr Mrs Rare Title
># 61 264 757 198 29
in Family Size variable, I will divide into three categories such as (“1”, “2-5”, “>5”) family size to simplify the analysis
full_data$Familysize <- full_data$SibSp + full_data$Parch + 1
full_data$Familysize[full_data$Familysize == 1] <- "1"
full_data$Familysize[full_data$Familysize < 5 & full_data$Familysize >= 2] <- "2-5"
full_data$Familysize[full_data$Familysize >= 5] <- ">5"
full_data$Familysize[full_data$Familysize == 11] <- ">5"
table(full_data$Familysize)>#
># >5 1 2-5
># 82 790 437
recheck the missing values again
># PassengerId Survived Pclass Name Sex Age
># 0 418 0 0 0 0
># SibSp Parch Ticket Fare Cabin Embarked
># 0 0 0 1 1014 0
># Age Group Title Familysize
># 0 0 0
only left Survived and Cabin variable that have missing values. Missing values in Survived, that is test data used for prediction. For Cabin variable, I will not used it later because not very useful for analysis.
Now, I will change the class type of some variable into a factor
full_data <- full_data %>%
mutate( Survived = as.factor(Survived),
Pclass = as.factor(Pclass),
Sex = as.factor(Sex),
Embarked = as.factor(Embarked),
`Age Group` = as.factor(`Age Group`),
Title = as.factor(Title),
Familysize = as.factor(Familysize))Check the class for each variables again,
># Observations: 1,309
># Variables: 15
># $ PassengerId <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, ...
># $ Survived <fct> 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0...
># $ Pclass <fct> 3, 1, 3, 1, 3, 3, 1, 3, 3, 2, 3, 1, 3, 3, 3, 2, 3, 2, 3...
># $ Name <chr> "Braund, Mr. Owen Harris", "Cumings, Mrs. John Bradley ...
># $ Sex <fct> male, female, female, female, male, male, male, male, f...
># $ Age <dbl> 22.00000, 38.00000, 26.00000, 35.00000, 35.00000, 29.88...
># $ SibSp <int> 1, 1, 0, 1, 0, 0, 0, 3, 0, 1, 1, 0, 0, 1, 0, 0, 4, 0, 1...
># $ Parch <int> 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 1, 0, 0, 5, 0, 0, 1, 0, 0...
># $ Ticket <chr> "A/5 21171", "PC 17599", "STON/O2. 3101282", "113803", ...
># $ Fare <dbl> 7.2500, 71.2833, 7.9250, 53.1000, 8.0500, 8.4583, 51.86...
># $ Cabin <chr> NA, "C85", NA, "C123", NA, NA, "E46", NA, NA, NA, "G6",...
># $ Embarked <fct> S, C, S, S, S, Q, S, S, S, C, S, S, S, S, S, S, Q, S, S...
># $ `Age Group` <fct> 18-59, 18-59, 18-59, 18-59, 18-59, 18-59, 18-59, 00-12,...
># $ Title <fct> Mr, Mrs, Miss, Mrs, Mr, Mr, Mr, Master, Mrs, Mrs, Miss,...
># $ Familysize <fct> 2-5, 2-5, 1, 2-5, 1, 1, 1, >5, 2-5, 2-5, 2-5, 1, 1, >5,...
Now, I will discard some variables that not used in analysis, I will only used “Survived”, “Pclass”, “Sex”, “Fare”, “Embarked”, “Age Group”, “Title”, “Family Size” variables.
final_data <- full_data %>%
select(Survived, Pclass, Sex, Fare, Embarked, `Age Group`, Title, Familysize, Age)Now, I will separate again into train and test data as in the beginning.
I will make the alluvial graph to see in general how each variable relates to each other
alluvialgraph <- train_fix %>%
group_by(Survived, Sex, Pclass, `Age Group`) %>%
summarise(total = n()) %>%
ungroup %>%
na.omit()
alluvial(alluvialgraph[, c(1:4)],
freq = alluvialgraph$total, border = NA,
col = ifelse(alluvialgraph$Survived == "1", "green", "red"),
cex = 0.65,
ordering = list(
order(alluvialgraph$Survived, alluvialgraph$Pclass == 1),
order(alluvialgraph$Sex, alluvialgraph$Pclass == 1), NULL, NULL
))Notes :
- Green (Passenger can survived)
- Red (Passenger can’t survived)
From the graph, can be seen that female are more likely to survived than male.
From the Pclass perspective, there is a tendency pattern the higher the class, the higher also survival rate.
From the Age group perspective, most of the data is in the category “18-59” age, but look likely more are not survive than survive.
I will look more detail in the bar chart how each variable related with survival rate.
Notes : You can see the survival rate and not survive rate more detail by pressing the bar in the graph.
From the graph, it supports the alluvial graph that show the higher the class passengers, the higher the chances of surviving.Notes : You can see the survival rate and not survive rate more detail by pressing the bar in the graph.
From the graph, it supports the alluvial graph that show female are more likely to survive than male.Notes : You can see the survival rate and not survive rate more detail by pressing the bar in the graph.
From the graph, can be seen the higher age group, the lower survival rate for passengers.Notes : You can see the survival rate and not survive rate more detail by pressing the bar in the graph.
From the graph, can be seen that Cherbourg embarkation that passenger come have more survival rate than Queenstown and Southampton.Notes : You can see the survival rate and not survive rate more detail by pressing the bar in the graph.
From the graph, can be seen that Mrs and Miss title have more survival rate than Master, Mr, and Rare Title.Notes : You can see the survival rate and not survive rate more detail by pressing the bar in the graph.
From the graph, there are unique facts that single person have survival rate lower than person come with the family around (2-5) people.Before we make modelling for the survival rate, we will check how the proportion for survival rate with survive and not survive to see if there is a gap that is far enough.
>#
># 0 1
># 0.6161616 0.3838384
>#
># 0 1
>#
From the result, we have a proportion for survival rate around 60% and not survive 40%, still acceptable to use.
Make stepwise first to get the best model with Akaike Information Criterion (AIC)
modelfull_logit <- glm(Survived~., family = binomial(link=logit), data = train_fix)
modelnone_logit <- glm(Survived~1, family = binomial(link=logit), data = train_fix)
modelaic <- step(modelnone_logit, scope = list(lower = modelnone_logit, upper = modelfull_logit), direction = "both")># Start: AIC=1188.66
># Survived ~ 1
>#
># Df Deviance AIC
># + Title 4 886.59 896.59
># + Sex 1 917.80 921.80
># + Pclass 2 1083.11 1089.11
># + Familysize 2 1111.56 1117.56
># + Fare 1 1117.57 1121.57
># + Embarked 2 1161.29 1167.29
># + `Age Group` 3 1171.54 1179.54
># + Age 1 1182.21 1186.21
># <none> 1186.66 1188.66
>#
># Step: AIC=896.59
># Survived ~ Title
>#
># Df Deviance AIC
># + Pclass 2 784.43 798.43
># + Familysize 2 813.96 827.96
># + Fare 1 856.13 868.13
># + Embarked 2 866.08 880.08
># + Sex 1 879.37 891.37
># <none> 886.59 896.59
># + Age 1 885.61 897.61
># + `Age Group` 3 884.83 900.83
># - Title 4 1186.66 1188.66
>#
># Step: AIC=798.43
># Survived ~ Title + Pclass
>#
># Df Deviance AIC
># + Familysize 2 731.57 749.57
># + Embarked 2 775.98 793.98
># + Sex 1 778.55 794.55
># + Age 1 779.42 795.42
># <none> 784.43 798.43
># + Fare 1 784.42 800.42
># + `Age Group` 3 780.85 800.85
># - Pclass 2 886.59 896.59
># - Title 4 1083.11 1089.11
>#
># Step: AIC=749.57
># Survived ~ Title + Pclass + Familysize
>#
># Df Deviance AIC
># + Age 1 724.66 744.66
># + Sex 1 725.84 745.84
># + Fare 1 727.39 747.39
># <none> 731.57 749.57
># + Embarked 2 728.28 750.28
># + `Age Group` 3 727.08 751.08
># - Familysize 2 784.43 798.43
># - Pclass 2 813.96 827.96
># - Title 4 1039.15 1049.15
>#
># Step: AIC=744.66
># Survived ~ Title + Pclass + Familysize + Age
>#
># Df Deviance AIC
># + Sex 1 719.19 741.19
># + Fare 1 721.17 743.17
># <none> 724.66 744.66
># + Embarked 2 721.50 745.50
># - Age 1 731.57 749.57
># + `Age Group` 3 724.26 750.26
># - Familysize 2 779.42 795.42
># - Pclass 2 813.49 829.49
># - Title 4 1007.55 1019.55
>#
># Step: AIC=741.19
># Survived ~ Title + Pclass + Familysize + Age + Sex
>#
># Df Deviance AIC
># + Fare 1 715.59 739.59
># <none> 719.19 741.19
># + Embarked 2 715.84 741.84
># - Sex 1 724.66 744.66
># - Age 1 725.84 745.84
># + `Age Group` 3 718.84 746.84
># - Title 4 770.27 784.27
># - Familysize 2 773.78 791.78
># - Pclass 2 806.23 824.23
>#
># Step: AIC=739.59
># Survived ~ Title + Pclass + Familysize + Age + Sex + Fare
>#
># Df Deviance AIC
># <none> 715.59 739.59
># + Embarked 2 713.17 741.17
># - Fare 1 719.19 741.19
># - Sex 1 721.17 743.17
># - Age 1 721.58 743.58
># + `Age Group` 3 715.12 745.12
># - Pclass 2 757.89 777.89
># - Title 4 768.49 784.49
># - Familysize 2 773.78 793.78
Model :
\[{Survive_{i}}={\ln(\frac{P_{i}}{1-P_{i}})}={\beta_{0}} + {\beta_{1}}{Title_{i}} + {\beta_{2}}{Pclass_{i}} + {\beta_{3}}{Familysize_{i}} +{\beta_{4}}{Age}+ {\beta_{5}}{Sex_{i}} + {\beta_{6}}{Fare} + {\upsilon_{i}}\] \({Survive_{i}} = 1\), Passenger survive
\({Survive_{i}} = 0\), Passenger not survive
\({\ln(\frac{P_{i}}{1-P_{i}})}\), log odds ratio
Vector of Title :
\[{Title_{1}}|_{0=lainnya}^{1=Miss};{Title_{2}}|_{0=lainnya}^{1=Mr}; {Title_{3}}|_{0=lainnya}^{1=Mrs}; {Title_{4}}|_{0=lainnya}^{1=Rare Title}\]
Vector of Pclass :
\[{Pclass_{1}}|_{0=lainnya}^{1=2};{Pclass_{2}}|_{0=lainnya}^{1=3}\]
Vector of Family Size :
\[{FamilySize_{1}}|_{0=lainnya}^{1=1};{Family Size_{2}}|_{0=lainnya}^{1=2-5}\]
Vector of Sex :
\[{Sex_{1}}|_{0=lainnya}^{1=Male}\]
>#
># Call:
># glm(formula = Survived ~ Title + Pclass + Familysize + Age +
># Sex + Fare, family = binomial(link = logit), data = train_fix)
>#
># Deviance Residuals:
># Min 1Q Median 3Q Max
># -2.7087 -0.5234 -0.4035 0.5456 2.3736
>#
># Coefficients:
># Estimate Std. Error z value Pr(>|z|)
># (Intercept) 15.939130 503.790184 0.032 0.974760
># TitleMiss -15.897367 503.790126 -0.032 0.974827
># TitleMr -3.668417 0.572644 -6.406 0.000000000149 ***
># TitleMrs -15.309605 503.790184 -0.030 0.975757
># TitleRare Title -3.692859 0.804132 -4.592 0.000004382714 ***
># Pclass2 -1.149886 0.319436 -3.600 0.000319 ***
># Pclass3 -2.016565 0.311918 -6.465 0.000000000101 ***
># Familysize1 3.193405 0.509536 6.267 0.000000000367 ***
># Familysize2-5 2.866580 0.475845 6.024 0.000000001700 ***
># Age -0.022124 0.009188 -2.408 0.016047 *
># Sexmale -15.244466 503.789826 -0.030 0.975860
># Fare 0.004617 0.002654 1.740 0.081934 .
># ---
># Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>#
># (Dispersion parameter for binomial family taken to be 1)
>#
># Null deviance: 1186.66 on 890 degrees of freedom
># Residual deviance: 715.59 on 879 degrees of freedom
># AIC: 739.59
>#
># Number of Fisher Scoring iterations: 13
>#
># fitting null model for pseudo-r2
># llh llhNull G2 McFadden r2ML r2CU
># -357.7927263 -593.3275684 471.0696841 0.3969727 0.4106280 0.5579149
Analysis from the model :
1. There are several significant variables which p-value < 0.05 such as Title, Pclass, and FamilySize variables.
2. using pR2 function in the pscl package allows to see a linear regression on R-square value equivalent, which is the McFadden R-Square index. This is equivalently saying that the variation logistic regression model has well explained 39,19% of variation of survival prediction.
Interpretation from the model :
first, we will convert the log odds ratio into probability using inverse logit. I will only convert the variables that have significant pvalue < 0.05
# Probability Title Mr
inv.logit(-4.061508)
# Probability Title Rare Title
inv.logit(-4.265822)
# Probability Pclass2 (second class)
inv.logit(-0.954686)
# Probability Pclass3 (third class)
inv.logit(-1.763116)
# Probability Family Size only 1
inv.logit(3.108449)
# Probability Family Size only 2-5
inv.logit(2.874488)># [1] 0.01693142
># [1] 0.01384592
># [1] 0.2779434
># [1] 0.1464005
># [1] 0.9572399
># [1] 0.9465708
Interpretation from the model :
1. Probability for passenger who have Mr title to survive is 0.0169 (1,69%)
2. Probability for passenger who have Rare title to survive is 0.0138 (1,38%)
3. Probability for passenger from second ticket class to survive is 0.2779 (27,79%)
4. Probability for passenger from third ticket class to survive is 0.1464 (14,64%)
5. Probability for passenger who alone on the ship to survive is 0.9572 (95,72%)
6. Probability for passenger who bring his family as much as 2 to 5 people on the ship to survive is 0.9465 (94,65%)
Now I want to see the goodness of our model using Confusion Matrix to get the accuracy and ROC
pred_train <- predict(modelaic, data = train_fix, type = "response")
pred_train = as.factor(ifelse(pred_train > 0.5, "1", "0"))
confusionMatrix(pred_train, train_fix$Survived, positive = "1")># Confusion Matrix and Statistics
>#
># Reference
># Prediction 0 1
># 0 486 89
># 1 63 253
>#
># Accuracy : 0.8294
># 95% CI : (0.8031, 0.8535)
># No Information Rate : 0.6162
># P-Value [Acc > NIR] : < 0.0000000000000002
>#
># Kappa : 0.6341
>#
># Mcnemar's Test P-Value : 0.04258
>#
># Sensitivity : 0.7398
># Specificity : 0.8852
># Pos Pred Value : 0.8006
># Neg Pred Value : 0.8452
># Prevalence : 0.3838
># Detection Rate : 0.2840
># Detection Prevalence : 0.3547
># Balanced Accuracy : 0.8125
>#
># 'Positive' Class : 1
>#
Accuracy = 0.8294
Recall/Sensitivity = 0.7398
Precision = 0.8006
Specificity = 0.8852
Logistic regression has a pretty good model because it has a level of accuracy such as 0.8294
ROC curve is used to describe the relationship between Recall/Sensitivity and False Positive Rate (1-Specifity) for each threshold
pred_train1 <- predict(modelaic, data = train_fix)
rocpred <- prediction(pred_train1, train_fix$Survived)
roc <- performance(rocpred, measure = "tpr", x.measure = "fpr")
plot(roc)
abline(0,1, lwd = 2, lty = 2)># Formal class 'performance' [package "ROCR"] with 6 slots
># ..@ x.name : chr "None"
># ..@ y.name : chr "Area under the ROC curve"
># ..@ alpha.name : chr "none"
># ..@ x.values : list()
># ..@ y.values :List of 1
># .. ..$ : num 0.88
># ..@ alpha.values: list()
># [1] 0.8797521
Area Under Curve ROC Curve has an area of 0.8797, it means that Logistic regression is pretty good to able to distinguish between positive classes and negative classes as well.
in this part, I want to try to compare Logistic Regression with Decision Tree and Support Vector Machine (SVM), which model can best explain survival rate Titanic.
set.seed(100)
ctrl <- ctree_control(mincriterion = 0.95)
modeltree <- ctree(Survived~., data = train_fix, control = ctrl)
plot(modeltree, type = "simple")Because the pitcure is a bit messy, I will try to make it neater
set.seed(100)
modeldt <- rpart(Survived~., data = train_fix, method = "class")
fancyRpartPlot(modeldt)pred_tree <- predict(modeltree, train_fix)
confusionMatrix(pred_tree, train_fix$Survived, positive = "1")># Confusion Matrix and Statistics
>#
># Reference
># Prediction 0 1
># 0 492 93
># 1 57 249
>#
># Accuracy : 0.8316
># 95% CI : (0.8054, 0.8557)
># No Information Rate : 0.6162
># P-Value [Acc > NIR] : < 0.00000000000000022
>#
># Kappa : 0.6369
>#
># Mcnemar's Test P-Value : 0.004267
>#
># Sensitivity : 0.7281
># Specificity : 0.8962
># Pos Pred Value : 0.8137
># Neg Pred Value : 0.8410
># Prevalence : 0.3838
># Detection Rate : 0.2795
># Detection Prevalence : 0.3434
># Balanced Accuracy : 0.8121
>#
># 'Positive' Class : 1
>#
Accuracy = 0.8316
Recall/Sensitivity = 0.7281
Precision = 0.8137
Specificity = 0.8962
Decision Tree has a pretty good model because it has a level of accuracy such as 0.8316, slightly higher than logistic regression
pred_tree1 <- predict(modeltree, train_fix, type = "prob")
rocpreddc <- prediction(pred_tree1[,2], train_fix$Survived)
rocdc <- performance(rocpreddc, measure = "tpr", x.measure = "fpr")
plot(rocdc)
abline(0,1, lwd = 2, lty = 2)># Formal class 'performance' [package "ROCR"] with 6 slots
># ..@ x.name : chr "None"
># ..@ y.name : chr "Area under the ROC curve"
># ..@ alpha.name : chr "none"
># ..@ x.values : list()
># ..@ y.values :List of 1
># .. ..$ : num 0.873
># ..@ alpha.values: list()
># [1] 0.8726685
Area Under Curve ROC Curve has an area of 0.8726, Decision Tree AUC value little lower than logistic regression.
># Setting default kernel parameters
># Support Vector Machine object of class "ksvm"
>#
># SV type: C-svc (classification)
># parameter : cost C = 1
>#
># Linear (vanilla) kernel function.
>#
># Number of Support Vectors : 371
>#
># Objective Function Value : -318.5
># Training error : 0.171717
svm.predict <- predict(svm.model, train_fix)
head(svm.predict)
confusionMatrix(svm.predict, train_fix$Survived)># [1] 0 1 1 1 0 0
># Levels: 0 1
># Confusion Matrix and Statistics
>#
># Reference
># Prediction 0 1
># 0 492 96
># 1 57 246
>#
># Accuracy : 0.8283
># 95% CI : (0.8019, 0.8525)
># No Information Rate : 0.6162
># P-Value [Acc > NIR] : < 0.00000000000000022
>#
># Kappa : 0.629
>#
># Mcnemar's Test P-Value : 0.002125
>#
># Sensitivity : 0.8962
># Specificity : 0.7193
># Pos Pred Value : 0.8367
># Neg Pred Value : 0.8119
># Prevalence : 0.6162
># Detection Rate : 0.5522
># Detection Prevalence : 0.6599
># Balanced Accuracy : 0.8077
>#
># 'Positive' Class : 0
>#
Accuracy = 0.8283
Recall/Sensitivity = 0.8962
Precision = 0.8367
Specificity = 0.7193
SVM has a level of accuracy such as 0.8283, lower than Logistic Regression and Decision Tree.
LR Accuracy : 0.8294
DT Accuracy : 0.8316
SVM Accuracy : 0.8283