Refer to http://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data))
for variable description. The response variable is Class
and all others are predictors.
library(caret) #this package contains the german data with its numeric format
## Warning: package 'caret' was built under R version 4.5.2
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.5.2
## Loading required package: lattice
data(GermanCredit)
GermanCredit$Class <- as.numeric(GermanCredit$Class == "Good") # use this code to convert `Class` into True or False (equivalent to 1 or 0)
GermanCredit$Class <- as.factor(GermanCredit$Class) #make sure `Class` is a factor as SVM require a factor response,now 1 is good and 0 is bad.
str(GermanCredit)
## 'data.frame': 1000 obs. of 62 variables:
## $ Duration : int 6 48 12 42 24 36 24 36 12 30 ...
## $ Amount : int 1169 5951 2096 7882 4870 9055 2835 6948 3059 5234 ...
## $ InstallmentRatePercentage : int 4 2 2 2 3 2 3 2 2 4 ...
## $ ResidenceDuration : int 4 2 3 4 4 4 4 2 4 2 ...
## $ Age : int 67 22 49 45 53 35 53 35 61 28 ...
## $ NumberExistingCredits : int 2 1 1 1 2 1 1 1 1 2 ...
## $ NumberPeopleMaintenance : int 1 1 2 2 2 2 1 1 1 1 ...
## $ Telephone : num 0 1 1 1 1 0 1 0 1 1 ...
## $ ForeignWorker : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Class : Factor w/ 2 levels "0","1": 2 1 2 2 1 2 2 2 2 1 ...
## $ CheckingAccountStatus.lt.0 : num 1 0 0 1 1 0 0 0 0 0 ...
## $ CheckingAccountStatus.0.to.200 : num 0 1 0 0 0 0 0 1 0 1 ...
## $ CheckingAccountStatus.gt.200 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CheckingAccountStatus.none : num 0 0 1 0 0 1 1 0 1 0 ...
## $ CreditHistory.NoCredit.AllPaid : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CreditHistory.ThisBank.AllPaid : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CreditHistory.PaidDuly : num 0 1 0 1 0 1 1 1 1 0 ...
## $ CreditHistory.Delay : num 0 0 0 0 1 0 0 0 0 0 ...
## $ CreditHistory.Critical : num 1 0 1 0 0 0 0 0 0 1 ...
## $ Purpose.NewCar : num 0 0 0 0 1 0 0 0 0 1 ...
## $ Purpose.UsedCar : num 0 0 0 0 0 0 0 1 0 0 ...
## $ Purpose.Furniture.Equipment : num 0 0 0 1 0 0 1 0 0 0 ...
## $ Purpose.Radio.Television : num 1 1 0 0 0 0 0 0 1 0 ...
## $ Purpose.DomesticAppliance : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Repairs : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Education : num 0 0 1 0 0 1 0 0 0 0 ...
## $ Purpose.Vacation : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Retraining : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Business : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Other : num 0 0 0 0 0 0 0 0 0 0 ...
## $ SavingsAccountBonds.lt.100 : num 0 1 1 1 1 0 0 1 0 1 ...
## $ SavingsAccountBonds.100.to.500 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ SavingsAccountBonds.500.to.1000 : num 0 0 0 0 0 0 1 0 0 0 ...
## $ SavingsAccountBonds.gt.1000 : num 0 0 0 0 0 0 0 0 1 0 ...
## $ SavingsAccountBonds.Unknown : num 1 0 0 0 0 1 0 0 0 0 ...
## $ EmploymentDuration.lt.1 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ EmploymentDuration.1.to.4 : num 0 1 0 0 1 1 0 1 0 0 ...
## $ EmploymentDuration.4.to.7 : num 0 0 1 1 0 0 0 0 1 0 ...
## $ EmploymentDuration.gt.7 : num 1 0 0 0 0 0 1 0 0 0 ...
## $ EmploymentDuration.Unemployed : num 0 0 0 0 0 0 0 0 0 1 ...
## $ Personal.Male.Divorced.Seperated : num 0 0 0 0 0 0 0 0 1 0 ...
## $ Personal.Female.NotSingle : num 0 1 0 0 0 0 0 0 0 0 ...
## $ Personal.Male.Single : num 1 0 1 1 1 1 1 1 0 0 ...
## $ Personal.Male.Married.Widowed : num 0 0 0 0 0 0 0 0 0 1 ...
## $ Personal.Female.Single : num 0 0 0 0 0 0 0 0 0 0 ...
## $ OtherDebtorsGuarantors.None : num 1 1 1 0 1 1 1 1 1 1 ...
## $ OtherDebtorsGuarantors.CoApplicant : num 0 0 0 0 0 0 0 0 0 0 ...
## $ OtherDebtorsGuarantors.Guarantor : num 0 0 0 1 0 0 0 0 0 0 ...
## $ Property.RealEstate : num 1 1 1 0 0 0 0 0 1 0 ...
## $ Property.Insurance : num 0 0 0 1 0 0 1 0 0 0 ...
## $ Property.CarOther : num 0 0 0 0 0 0 0 1 0 1 ...
## $ Property.Unknown : num 0 0 0 0 1 1 0 0 0 0 ...
## $ OtherInstallmentPlans.Bank : num 0 0 0 0 0 0 0 0 0 0 ...
## $ OtherInstallmentPlans.Stores : num 0 0 0 0 0 0 0 0 0 0 ...
## $ OtherInstallmentPlans.None : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Housing.Rent : num 0 0 0 0 0 0 0 1 0 0 ...
## $ Housing.Own : num 1 1 1 0 0 0 1 0 1 1 ...
## $ Housing.ForFree : num 0 0 0 1 1 1 0 0 0 0 ...
## $ Job.UnemployedUnskilled : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Job.UnskilledResident : num 0 0 1 0 0 1 0 0 1 0 ...
## $ Job.SkilledEmployee : num 1 1 0 1 1 0 1 0 0 0 ...
## $ Job.Management.SelfEmp.HighlyQualified: num 0 0 0 0 0 0 0 1 0 1 ...
# This is the code that drop variables that provide no information in the data
# Just run it
GermanCredit = GermanCredit[,-c(14,19,27,30,35,40,44,45,48,52,55,58,62)]
summary(GermanCredit)
## Duration Amount InstallmentRatePercentage ResidenceDuration
## Min. : 4.0 Min. : 250 Min. :1.000 Min. :1.000
## 1st Qu.:12.0 1st Qu.: 1366 1st Qu.:2.000 1st Qu.:2.000
## Median :18.0 Median : 2320 Median :3.000 Median :3.000
## Mean :20.9 Mean : 3271 Mean :2.973 Mean :2.845
## 3rd Qu.:24.0 3rd Qu.: 3972 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :72.0 Max. :18424 Max. :4.000 Max. :4.000
## Age NumberExistingCredits NumberPeopleMaintenance Telephone
## Min. :19.00 Min. :1.000 Min. :1.000 Min. :0.000
## 1st Qu.:27.00 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:0.000
## Median :33.00 Median :1.000 Median :1.000 Median :1.000
## Mean :35.55 Mean :1.407 Mean :1.155 Mean :0.596
## 3rd Qu.:42.00 3rd Qu.:2.000 3rd Qu.:1.000 3rd Qu.:1.000
## Max. :75.00 Max. :4.000 Max. :2.000 Max. :1.000
## ForeignWorker Class CheckingAccountStatus.lt.0
## Min. :0.000 0:300 Min. :0.000
## 1st Qu.:1.000 1:700 1st Qu.:0.000
## Median :1.000 Median :0.000
## Mean :0.963 Mean :0.274
## 3rd Qu.:1.000 3rd Qu.:1.000
## Max. :1.000 Max. :1.000
## CheckingAccountStatus.0.to.200 CheckingAccountStatus.gt.200
## Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000
## Median :0.000 Median :0.000
## Mean :0.269 Mean :0.063
## 3rd Qu.:1.000 3rd Qu.:0.000
## Max. :1.000 Max. :1.000
## CreditHistory.NoCredit.AllPaid CreditHistory.ThisBank.AllPaid
## Min. :0.00 Min. :0.000
## 1st Qu.:0.00 1st Qu.:0.000
## Median :0.00 Median :0.000
## Mean :0.04 Mean :0.049
## 3rd Qu.:0.00 3rd Qu.:0.000
## Max. :1.00 Max. :1.000
## CreditHistory.PaidDuly CreditHistory.Delay Purpose.NewCar Purpose.UsedCar
## Min. :0.00 Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:0.00 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000
## Median :1.00 Median :0.000 Median :0.000 Median :0.000
## Mean :0.53 Mean :0.088 Mean :0.234 Mean :0.103
## 3rd Qu.:1.00 3rd Qu.:0.000 3rd Qu.:0.000 3rd Qu.:0.000
## Max. :1.00 Max. :1.000 Max. :1.000 Max. :1.000
## Purpose.Furniture.Equipment Purpose.Radio.Television Purpose.DomesticAppliance
## Min. :0.000 Min. :0.00 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.00 1st Qu.:0.000
## Median :0.000 Median :0.00 Median :0.000
## Mean :0.181 Mean :0.28 Mean :0.012
## 3rd Qu.:0.000 3rd Qu.:1.00 3rd Qu.:0.000
## Max. :1.000 Max. :1.00 Max. :1.000
## Purpose.Repairs Purpose.Education Purpose.Retraining Purpose.Business
## Min. :0.000 Min. :0.00 Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.00 1st Qu.:0.000 1st Qu.:0.000
## Median :0.000 Median :0.00 Median :0.000 Median :0.000
## Mean :0.022 Mean :0.05 Mean :0.009 Mean :0.097
## 3rd Qu.:0.000 3rd Qu.:0.00 3rd Qu.:0.000 3rd Qu.:0.000
## Max. :1.000 Max. :1.00 Max. :1.000 Max. :1.000
## SavingsAccountBonds.lt.100 SavingsAccountBonds.100.to.500
## Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000
## Median :1.000 Median :0.000
## Mean :0.603 Mean :0.103
## 3rd Qu.:1.000 3rd Qu.:0.000
## Max. :1.000 Max. :1.000
## SavingsAccountBonds.500.to.1000 SavingsAccountBonds.gt.1000
## Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000
## Median :0.000 Median :0.000
## Mean :0.063 Mean :0.048
## 3rd Qu.:0.000 3rd Qu.:0.000
## Max. :1.000 Max. :1.000
## EmploymentDuration.lt.1 EmploymentDuration.1.to.4 EmploymentDuration.4.to.7
## Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000
## Median :0.000 Median :0.000 Median :0.000
## Mean :0.172 Mean :0.339 Mean :0.174
## 3rd Qu.:0.000 3rd Qu.:1.000 3rd Qu.:0.000
## Max. :1.000 Max. :1.000 Max. :1.000
## EmploymentDuration.gt.7 Personal.Male.Divorced.Seperated
## Min. :0.000 Min. :0.00
## 1st Qu.:0.000 1st Qu.:0.00
## Median :0.000 Median :0.00
## Mean :0.253 Mean :0.05
## 3rd Qu.:1.000 3rd Qu.:0.00
## Max. :1.000 Max. :1.00
## Personal.Female.NotSingle Personal.Male.Single OtherDebtorsGuarantors.None
## Min. :0.00 Min. :0.000 Min. :0.000
## 1st Qu.:0.00 1st Qu.:0.000 1st Qu.:1.000
## Median :0.00 Median :1.000 Median :1.000
## Mean :0.31 Mean :0.548 Mean :0.907
## 3rd Qu.:1.00 3rd Qu.:1.000 3rd Qu.:1.000
## Max. :1.00 Max. :1.000 Max. :1.000
## OtherDebtorsGuarantors.CoApplicant Property.RealEstate Property.Insurance
## Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000
## Median :0.000 Median :0.000 Median :0.000
## Mean :0.041 Mean :0.282 Mean :0.232
## 3rd Qu.:0.000 3rd Qu.:1.000 3rd Qu.:0.000
## Max. :1.000 Max. :1.000 Max. :1.000
## Property.CarOther OtherInstallmentPlans.Bank OtherInstallmentPlans.Stores
## Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000
## Median :0.000 Median :0.000 Median :0.000
## Mean :0.332 Mean :0.139 Mean :0.047
## 3rd Qu.:1.000 3rd Qu.:0.000 3rd Qu.:0.000
## Max. :1.000 Max. :1.000 Max. :1.000
## Housing.Rent Housing.Own Job.UnemployedUnskilled Job.UnskilledResident
## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.0
## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.0
## Median :0.000 Median :1.000 Median :0.000 Median :0.0
## Mean :0.179 Mean :0.713 Mean :0.022 Mean :0.2
## 3rd Qu.:0.000 3rd Qu.:1.000 3rd Qu.:0.000 3rd Qu.:0.0
## Max. :1.000 Max. :1.000 Max. :1.000 Max. :1.0
## Job.SkilledEmployee
## Min. :0.00
## 1st Qu.:0.00
## Median :1.00
## Mean :0.63
## 3rd Qu.:1.00
## Max. :1.00
Your observation:
2024 for
reproducibility. (5pts)set.seed(2024)
index <- sample(1:nrow(GermanCredit),nrow(GermanCredit)*0.80)
credit_train = GermanCredit[index,]
credit_test = GermanCredit[-index,]
Your observation:
split. I have now 200 obs in test and 800 obs in train
library(e1071)
## Warning: package 'e1071' was built under R version 4.5.2
##
## Attaching package: 'e1071'
## The following object is masked from 'package:ggplot2':
##
## element
svm_model <- svm(Class ~ ., data = credit_train, kernel = 'linear')
# Summary of the trained model
summary(svm_model)
##
## Call:
## svm(formula = Class ~ ., data = credit_train, kernel = "linear")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 1
##
## Number of Support Vectors: 391
##
## ( 197 194 )
##
##
## Number of Classes: 2
##
## Levels:
## 0 1
#Your observation: #391 suppot vectors. kinda high but 197 and 194 is close and balanced.
# Make predictions on the train data
predictions <- predict(svm_model, credit_train)
# Confusion matrix to evaluate the model on train data
table(true = credit_train$Class, pred = predictions)
## pred
## true 0 1
## 0 132 97
## 1 59 512
Your observation: I see more true positives
# Make predictions on the train data
pred_credit_train <- predict(svm_model, credit_train)
# Confusion matrix to evaluate the model on train data
Cmatrix_train = table(true = credit_train$Class,
pred = pred_credit_train)
Cmatrix_train
## pred
## true 0 1
## 0 132 97
## 1 59 512
1 - sum(diag(Cmatrix_train))/sum(Cmatrix_train)
## [1] 0.195
Your observation:
# predictions on the test data
predictions <- predict(svm_model, credit_test)
# Confusion matrix to evaluate the model on test data
table(true = credit_test$Class, pred = predictions)
## pred
## true 0 1
## 0 36 35
## 1 20 109
Your observation:
# predictions on the testing data
pred_credit_test <- predict(svm_model, credit_test)
# Confusion matrix to evaluate the model on test data
Cmatrix_test = table(true = credit_test$Class,
pred = pred_credit_test)
Cmatrix_test
## pred
## true 0 1
## 0 36 35
## 1 20 109
Mis-classficiation Rate (MR)
1 - sum(diag(Cmatrix_test))/sum(Cmatrix_test)
## [1] 0.275
Your observation:
probability = TRUE######continue from here
credit.svm_asymmetric21 = svm(as.factor(Class) ~ .,
data = credit_train,
kernel = 'polynomial',
class.weights = c("0" = 1, "1" = 2))
credit.svm_asymmetric21
##
## Call:
## svm(formula = as.factor(Class) ~ ., data = credit_train, kernel = "polynomial",
## class.weights = c(`0` = 1, `1` = 2))
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: polynomial
## cost: 1
## degree: 3
## coef.0: 0
##
## Number of Support Vectors: 616
Your observation:
pred_credit_train21 <- predict(credit.svm_asymmetric21, credit_train)
pred_credit_train21
## 578 549 557 700 255 913 621 416 105 634 738 29 11 784 925 62
## 1 0 0 1 1 1 1 1 1 0 1 1 1 0 0 1
## 252 398 930 26 172 562 410 32 725 385 203 35 361 238 593 284
## 1 1 1 1 1 0 0 1 1 1 1 1 1 0 1 1
## 304 216 596 476 852 427 442 884 276 951 87 505 997 618 892 900
## 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 0
## 647 948 441 336 212 835 281 290 217 825 817 310 858 643 153 705
## 1 1 1 0 1 1 1 0 1 1 1 1 1 0 1 1
## 6 788 393 719 717 464 963 354 186 305 627 108 261 720 902 131
## 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1
## 938 459 723 414 329 189 259 541 954 747 960 445 334 528 548 209
## 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1
## 585 935 752 118 891 402 875 674 147 652 834 873 987 173 702 454
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 68 543 795 113 463 827 932 736 483 635 943 504 888 94 446 765
## 1 1 1 1 1 0 1 1 1 1 1 0 0 1 1 1
## 982 270 715 457 661 706 266 896 346 34 625 187 1000 411 976 901
## 0 1 0 1 1 1 1 1 1 1 1 0 1 1 1 0
## 737 770 611 109 999 826 805 469 897 369 119 568 789 676 576 766
## 0 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1
## 80 31 425 278 868 899 642 269 586 321 51 249 856 818 185 641
## 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0
## 808 247 776 16 955 133 679 513 387 206 24 600 649 348 846 995
## 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1
## 60 388 666 980 292 275 664 675 477 927 871 421 25 712 154 520
## 0 1 1 0 0 0 1 0 1 1 1 1 1 0 1 1
## 861 316 589 326 65 350 314 553 778 103 159 920 673 265 754 115
## 1 0 0 1 1 0 1 1 1 1 1 1 1 1 1 1
## 59 564 508 225 830 709 224 638 409 175 521 946 461 95 244 204
## 1 0 0 1 1 1 1 1 1 0 1 1 1 1 1 0
## 364 669 792 619 467 245 917 991 139 640 929 768 144 613 468 135
## 1 0 1 0 0 1 1 1 1 0 1 1 1 1 1 1
## 362 122 535 531 798 620 90 176 975 478 178 489 179 610 104 487
## 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1
## 263 599 831 242 887 366 71 384 340 591 291 220 594 527 228 970
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 500 219 419 730 726 854 672 306 268 449 761 77 150 615 222 289
## 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1
## 860 435 437 962 933 996 202 78 655 70 785 947 658 93 941 998
## 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1
## 481 685 495 880 967 96 235 412 968 491 315 277 240 58 308 569
## 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1
## 213 237 196 735 84 479 694 499 574 550 584 756 341 125 200 691
## 0 0 1 1 1 1 1 1 1 1 0 1 1 1 0 1
## 355 839 554 501 42 894 563 952 471 684 432 359 128 763 631 54
## 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1
## 916 551 254 949 786 182 28 874 49 188 984 232 210 807 799 405
## 0 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1
## 50 974 510 161 841 30 815 886 624 130 708 524 745 390 710 327
## 1 0 1 1 1 0 1 0 1 0 0 1 1 1 1 1
## 389 760 829 403 466 429 299 170 408 668 297 395 363 287 86 677
## 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1
## 865 570 253 136 703 956 804 248 571 332 124 796 191 66 688 488
## 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1
## 958 211 511 582 813 285 264 626 522 651 3 680 881 803 988 904
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 678 729 117 19 689 298 580 507 101 914 250 465 7 877 957 37
## 0 0 1 0 1 1 1 1 1 1 0 1 1 1 1 1
## 451 309 184 323 836 39 490 503 692 134 722 8 15 971 99 663
## 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1
## 426 138 417 573 221 201 246 629 622 73 157 538 43 882 516 79
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 227 903 462 812 950 231 75 140 711 989 749 698 1 607 923 819
## 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 994 283 205 842 274 849 351 145 386 783 226 360 373 575 52 48
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 707 683 840 714 879 324 660 151 837 937 727 375 605 18 482 33
## 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1
## 695 169 98 572 744 966 567 512 823 759 579 912 517 530 866 422
## 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1
## 905 152 258 302 197 177 450 713 751 936 368 654 293 907 160 322
## 1 1 0 0 1 1 0 1 1 0 1 0 1 1 1 0
## 486 732 547 833 271 539 940 780 637 337 27 870 61 944 746 764
## 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1
## 595 379 36 116 383 88 519 431 127 223 146 965 614 241 972 22
## 0 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1
## 47 632 657 129 328 21 38 928 990 267 869 229 53 338 993 92
## 1 0 0 1 1 1 0 0 1 1 1 1 1 0 1 1
## 514 617 779 319 55 604 606 979 162 142 301 367 243 194 311 494
## 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1
## 828 256 910 370 644 400 609 294 452 413 851 750 601 908 774 757
## 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1
## 645 895 392 347 401 820 493 876 166 682 515 498 755 148 455 646
## 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0
## 855 506 475 372 2 85 959 342 537 824 656 848 650 295 898 5
## 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1
## 406 616 438 257 378 404 953 667 801 806 439 565 782 460 436 76
## 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1
## 791 890 889 83 811 365 509 546 282 357 448 909 121 345 9 536
## 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0
## 12 356 193 325 192 317 344 163 181 485 198 864 40 353 718 214
## 0 0 1 1 0 1 1 1 0 1 0 1 1 1 1 0
## 969 561 158 333 560 648 636 787 981 132 559 190 853 918 773 234
## 1 1 1 0 1 1 1 1 1 0 0 1 1 0 1 1
## 693 123 985 734 724 300 566 623 82 46 590 800 296 444 81 423
## 1 1 1 1 1 1 1 0 1 1 1 1 0 0 1 1
## 330 977 961 97 23 4 838 931 922 382 72 687 681 14 696 358
## 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1
## 168 456 313 832 542 111 407 484 612 628 639 518 307 339 492 767
## 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0 1
## Levels: 0 1
Your observation:
C_matrix21_train <- table( true = credit_train$Class, pred = pred_credit_train21)
MR_train21 <- 1 - sum(diag(C_matrix21_train))/sum(C_matrix21_train)
MR_train21
## [1] 0.13375
Your observation:
#refit the model with probabilities enabled
credit.svm_prob = svm(as.factor(Class) ~ .,
data = credit_train, kernel = 'linear',
probability = TRUE)
pred_prob_train = predict(credit.svm_prob,
newdata = credit_train,
probability = TRUE)
# this is necessary
pred_prob_train = attr(pred_prob_train, "probabilities")[, 2]
library(ROCR)
## Warning: package 'ROCR' was built under R version 4.5.2
pred <- prediction(pred_prob_train, credit_train$Class)
perf <- performance(pred, "tpr", "fpr")
plot(perf, colorize=TRUE)
Your observation:
#get predictions for testing
pred_credit_test21 <- predict(credit.svm_asymmetric21, credit_test)
pred_credit_test21
## 10 13 17 20 41 44 45 56 57 63 64 67 69 74 89 91 100 102 106 107
## 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 0
## 110 112 114 120 126 137 141 143 149 155 156 164 165 167 171 174 180 183 195 199
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1
## 207 208 215 218 230 233 236 239 251 260 262 272 273 279 280 286 288 303 312 318
## 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1
## 320 331 335 343 349 352 371 374 376 377 380 381 391 394 396 397 399 415 418 420
## 1 1 0 1 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1
## 424 428 430 433 434 440 443 447 453 458 470 472 473 474 480 496 497 502 523 525
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1
## 526 529 532 533 534 540 544 545 552 555 556 558 577 581 583 587 588 592 597 598
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 602 603 608 630 633 653 659 662 665 670 671 686 690 697 699 701 704 716 721 728
## 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 731 733 739 740 741 742 743 748 753 758 762 769 771 772 775 777 781 790 793 794
## 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 797 802 809 810 814 816 821 822 843 844 845 847 850 857 859 862 863 867 872 878
## 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 883 885 893 906 911 915 919 921 924 926 934 939 942 945 964 973 978 983 986 992
## 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 0 1 1 1 0
## Levels: 0 1
Your observation:
#C matrix for testing with more weights on "1"
C_matrix21_test <- table( true = credit_test$Class, pred = pred_credit_test21)
MR_test21 <- 1 - sum(diag(C_matrix21_test))/sum(C_matrix21_test)
MR_test21
## [1] 0.315
Your observation:
#obtain testing pred_prob
pred_prob_test = predict(credit.svm_prob,
newdata = credit_test,
probability = TRUE)
# this is necessary
pred_prob_test = attr(pred_prob_test, "probabilities")[, 2]
#ROC
library(ROCR)
pred <- prediction(pred_prob_test, credit_test$Class)
perf <- performance(pred, "tpr", "fpr")
plot(perf, colorize=TRUE)
Your observation: The False positive rate is a bit high
The baseline linear SVM achieved ~74–76% accuracy on the test set but had too many False Positives After applying class.weights = c(“0”=5, “1”=1), the number of costly mistakes dropped dramatically, while only slightly increasing the less costly mistake Test AUC remained excellent , showing that we improved business-relevant performance without sacrificing overall discrimination loss.
scoring problem where false approvals are 5× more expensive, SVM with class weights outperforms logistic regression . SVM gives cleaner more direct control over the exact cost ratio of mistake costs
radial, and see if you got a better result.