Prediction and Visualization of Motor Vehicle Insurance Premium
Prediction and Visualization of Motor Vehicle Insurance Premium
Background
Pencatatan premi pada asuransi kendaraan bermotor (Motor Vehicle Insurance) bisa dikelompokkan menjadi :
- Berdasarkan Tahun Produksi (Production Year), dimana produksi premi pada periode cutoff akhir tahun. Data ini akan digunakan sebagai premium reserved setiap tahunnya.
- Berdasarkan Tahun Underwriting (Underwriting Year), dimana produksi premi berdasarkan tahun underwriter menilai risiko tersebut, Data ini juga digunakan sebagai achievement dari tim marketing.
Sehingga dibutuhkan visualisasi yang bisa memberikan informasi atas ketiga kategori pencatatan premi tersebut sebagai dasar perusahaan asuransi dalam meningkatkan performa perusahaan
Premi asuransi kendaraan bermotor juga dipengaruhi oleh waktu karena sangat erat hubungannya dengan pembelian mobil, yang mana pembelian mobil juga di pengaruhi oleh waktu - waktu tertentu, misalnya pembelian mobil menjelang hari raya idul fitri akan meningkat, sehingga dibutuhkan algoritma yang sesuai untuk digunakan dalam melakukan prediksi pencatatan premi dari kedua kategori tersebut (Time Series Prediction)
Dalam asuransi kendaraan bermotor erat hubungannya dengan klaim, maka dari itu akan digunakan algoritma Machine Learning untuk mendeteksi fraud dalam pengajuan klaim berdasarkan data historis fraud klaim asuransi kendaraan bermotor.
Read Data
Data yang digunakan merupakan data produksi premi kendaraan bermotor yang terdiri dari beberapa segment / Business Channel yaitu :
- Direct Segment
- Agency Segment
- Leasing Segment
- Dealer Segment
Data Preprocess
Pada tahap ini kita akan melakukan pengelompokkan beberapa feauture yang akan kita gunakan untuk visualisasi pencatatan premi berdasarkan ketiga kategori di atas dan juga nantinya digunakan untuk pembuatan model time series dalam melakukan prediksi dari ketiga kategori pencatatan premi tersebut.
Select Columns
Keterangan :
NO_MONTH : Bulan ke- premi tersebut di catat pada sistem
MONTH : Bulan premi di catat pada sistem
SEGMENT : Segment untuk sumber bisnsis asuransi di dapatkan
POLICYNO : No Polis atau no master polis
TRANSACTION_TYPE : Jenis transaksi asuransi, apakah transaksi polis baru, perubahan, atau pembatalan
INCEPTION : Periode polis dimulai
EXPIRY : Periode polis berakhir
BOOKING_DATE : Tanggal premi dicatat sebagai produksi pada sistem
USER_APPROVE_DATE : Tanggal premi di invorce oleh underwriter
TOC : Type of Coverage (jenis cover asuransi)
TOC_DESCRIPTION : Deskripsi jenis cover asuransi
TOC_GROUP : Grup dari jenis cover asuransi
TOC_GROUP_DESCRIPTION : Deskripsi jenis cover asuransi
TSI : Total sum Insured (Harga pertanggungan Asuransi)
PREMIUM_GROSS : Premi kotor asuransi
DISCOUNT : Diskon premi asuransi untuk customer
COMMISION : Komisi asuransi untuk agen, broker, atau perantara
VAT : Pajak pertambahan nilai (ppn 10%) dari premi kotor asuransi
TAX : Pajak penghasilan (pph berdasarkan kategori perusahaan yang dikenakan pajak)
POLICY_FEE : Biaya administrasi polis asuransi
STAMP_DUTY : Biaya materai
VEHICLE_CATEGORY : Kategori kendaraan yang diasuransikan
VEHICLE_TYPE : Tipe kendaraan yang diasuransikan
GROUP_MV : Grup dari kendaraan yang diasuransikan, apakah Kendaraan Roda 2, Kendaraan Roda 4, atau truk tangki
Data Skimming
## Observations: 42,537
## Variables: 24
## $ NO_MONTH <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
## $ MONTH <fct> January, January, January, January, Janu...
## $ SEGMENT <fct> Direct, Agent, Agent, Agent, Agent, Agen...
## $ POLICYNO <fct> 11002011900022, 11502011900001, 11502011...
## $ TRANSACTION_TYPE <fct> NEW, NEW, NEW, NEW, NEW, NEW, NEW, NEW, ...
## $ INCEPTION <fct> 12/14/2018, 12/04/2018, 12/04/2018, 12/2...
## $ EXPIRY <fct> 12/14/2019, 12/27/2021, 12/04/2020, 12/2...
## $ BOOKING_DATE <fct> 01/09/2019, 01/02/2019, 01/02/2019, 01/0...
## $ USER_APPROVE_DATE <fct> 01/09/2019, 01/02/2019, 01/02/2019, 01/0...
## $ TOC <dbl> 201, 201, 201, 201, 201, 201, 201, 201, ...
## $ TOC_DESCRIPTION <fct> PSAKBI, PSAKBI, PSAKBI, PSAKBI, PSAKBI, ...
## $ TOC_GROUP <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2...
## $ TOC_GROUP_DESCRIPTION <fct> Motor Vehicle, Motor Vehicle, Motor Vehi...
## $ TSI <dbl> 165750000, 0, 37000000, 65000000, 780000...
## $ PREMIUM_GROSS <dbl> 5347933, 0, 207200, 637000, 764400, 5824...
## $ DISCOUNT <dbl> -1336983, 0, 0, 0, 0, 0, -3757500, -1958...
## $ COMMISSION <dbl> 0, 0, -51800, -159250, -191100, -145600,...
## $ VAT <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ TAX <dbl> 0, 0, 1295, 3981, 4778, 3640, 0, 0, 0, 0...
## $ POLICY_FEE <dbl> 44000, 120000, 0, 0, 0, 0, 50000, 50000,...
## $ STAMP_DUTY <dbl> 6000, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
## $ VEHICLE_CATEGORY <fct> Non Bus / Non Truck, #N/A, Non Bus / Non...
## $ VEHICLE_TYPE <fct> Sedan, #N/A, Minibus, Light Truck, Pick ...
## $ GROUP_MV <fct> MV 4, MV 4, MV 4, MV 4, MV 4, MV 4, MV 4...
Segmen yang paling berkontribusi
Jumlah Polis Terbanyak
Terlihat bahwa produksi premi terbesar adalah dari segment leasing, dimana segment leasing tersebut periode asuransi nya hampir semuanya multiyear. Sehingga sangat dibutuhkan visualisasi premi yang reserved setiap tahunnya dan juga premi yang akan dicatat sebagai pencapaian / achievement dari marketing.
Claim Fraud Detection
Read Data dan Data Preprocess
## months_as_customer age
## 0 0
## policy_number policy_bind_date
## 0 0
## policy_state policy_csl
## 0 0
## policy_deductable policy_annual_premium
## 0 0
## umbrella_limit insured_zip
## 0 0
## insured_sex insured_education_level
## 0 0
## insured_occupation insured_hobbies
## 0 0
## insured_relationship capital.gains
## 0 0
## capital.loss incident_date
## 0 0
## incident_type collision_type
## 0 0
## incident_severity authorities_contacted
## 0 0
## incident_state incident_city
## 0 0
## incident_location incident_hour_of_the_day
## 0 0
## number_of_vehicles_involved property_damage
## 0 0
## bodily_injuries witnesses
## 0 0
## police_report_available total_claim_amount
## 0 0
## injury_claim property_claim
## 0 0
## vehicle_claim auto_make
## 0 0
## auto_model auto_year
## 0 0
## fraud_reported
## 0
## Observations: 1,000
## Variables: 39
## $ months_as_customer <int> 328, 228, 134, 256, 228, 256, 137,...
## $ age <int> 48, 42, 29, 41, 44, 39, 34, 37, 33...
## $ policy_number <int> 521585, 342868, 687698, 227811, 36...
## $ policy_bind_date <fct> 10/17/2014, 06/27/2006, 09/06/2000...
## $ policy_state <fct> OH, IN, OH, IL, IL, OH, IN, IL, IL...
## $ policy_csl <fct> 250/500, 250/500, 100/300, 250/500...
## $ policy_deductable <int> 1000, 2000, 2000, 2000, 1000, 1000...
## $ policy_annual_premium <dbl> 1406.91, 1197.22, 1413.14, 1415.74...
## $ umbrella_limit <int> 0, 5000000, 5000000, 6000000, 6000...
## $ insured_zip <int> 466132, 468176, 430632, 608117, 61...
## $ insured_sex <fct> MALE, MALE, FEMALE, FEMALE, MALE, ...
## $ insured_education_level <fct> MD, MD, PhD, PhD, Associate, PhD, ...
## $ insured_occupation <fct> craft-repair, machine-op-inspct, s...
## $ insured_hobbies <fct> sleeping, reading, board-games, bo...
## $ insured_relationship <fct> husband, other-relative, own-child...
## $ capital.gains <int> 53300, 0, 35100, 48900, 66000, 0, ...
## $ capital.loss <int> 0, 0, 0, -62400, -46000, 0, -77000...
## $ incident_date <fct> 01/25/2015, 01/21/2015, 02/22/2015...
## $ incident_type <fct> Single Vehicle Collision, Vehicle ...
## $ collision_type <fct> Side Collision, ?, Rear Collision,...
## $ incident_severity <fct> Major Damage, Minor Damage, Minor ...
## $ authorities_contacted <fct> Police, Police, Police, Police, No...
## $ incident_state <fct> SC, VA, NY, OH, NY, SC, NY, VA, WV...
## $ incident_city <fct> Columbus, Riverwood, Columbus, Arl...
## $ incident_location <fct> 9935 4th Drive, 6608 MLK Hwy, 7121...
## $ incident_hour_of_the_day <int> 5, 8, 7, 5, 20, 19, 0, 23, 21, 14,...
## $ number_of_vehicles_involved <int> 1, 1, 3, 1, 1, 3, 3, 3, 1, 1, 1, 3...
## $ property_damage <fct> YES, ?, NO, ?, NO, NO, ?, ?, NO, N...
## $ bodily_injuries <int> 1, 0, 2, 1, 0, 0, 0, 2, 1, 2, 2, 1...
## $ witnesses <int> 2, 0, 3, 2, 1, 2, 0, 2, 1, 1, 2, 2...
## $ police_report_available <fct> YES, ?, NO, NO, NO, NO, ?, YES, YE...
## $ total_claim_amount <int> 71610, 5070, 34650, 63400, 6500, 6...
## $ injury_claim <int> 6510, 780, 7700, 6340, 1300, 6410,...
## $ property_claim <int> 13020, 780, 3850, 6340, 650, 6410,...
## $ vehicle_claim <int> 52080, 3510, 23100, 50720, 4550, 5...
## $ auto_make <fct> Saab, Mercedes, Dodge, Chevrolet, ...
## $ auto_model <fct> 92x, E400, RAM, Tahoe, RSX, 95, Pa...
## $ auto_year <int> 2004, 2007, 2007, 2014, 2009, 2003...
## $ fraud_reported <fct> Y, Y, N, Y, N, Y, N, N, N, N, N, N...
hasil dari skimming data di atas, terdapat beberap data numeric yang harus kita bining dan buat menjadi factor
delete column yang tidak digunakan
## Observations: 1,000
## Variables: 30
## $ policy_annual_premium <dbl> 1406.91, 1197.22, 1413.14, 1415.7...
## $ umbrella_limit <int> 0, 5000000, 5000000, 6000000, 600...
## $ insured_sex <fct> MALE, MALE, FEMALE, FEMALE, MALE,...
## $ insured_education_level <fct> MD, MD, PhD, PhD, Associate, PhD,...
## $ insured_occupation <fct> craft-repair, machine-op-inspct, ...
## $ insured_hobbies <fct> sleeping, reading, board-games, b...
## $ insured_relationship <fct> husband, other-relative, own-chil...
## $ capital.gains <int> 53300, 0, 35100, 48900, 66000, 0,...
## $ capital.loss <int> 0, 0, 0, -62400, -46000, 0, -7700...
## $ incident_type <fct> Single Vehicle Collision, Vehicle...
## $ collision_type <fct> Side Collision, ?, Rear Collision...
## $ incident_severity <fct> Major Damage, Minor Damage, Minor...
## $ authorities_contacted <fct> Police, Police, Police, Police, N...
## $ incident_city <fct> Columbus, Riverwood, Columbus, Ar...
## $ number_of_vehicles_involved <int> 1, 1, 3, 1, 1, 3, 3, 3, 1, 1, 1, ...
## $ property_damage <fct> YES, ?, NO, ?, NO, NO, ?, ?, NO, ...
## $ bodily_injuries <int> 1, 0, 2, 1, 0, 0, 0, 2, 1, 2, 2, ...
## $ witnesses <int> 2, 0, 3, 2, 1, 2, 0, 2, 1, 1, 2, ...
## $ police_report_available <fct> YES, ?, NO, NO, NO, NO, ?, YES, Y...
## $ total_claim_amount <int> 71610, 5070, 34650, 63400, 6500, ...
## $ injury_claim <int> 6510, 780, 7700, 6340, 1300, 6410...
## $ property_claim <int> 13020, 780, 3850, 6340, 650, 6410...
## $ vehicle_claim <int> 52080, 3510, 23100, 50720, 4550, ...
## $ auto_make <fct> Saab, Mercedes, Dodge, Chevrolet,...
## $ auto_model <fct> 92x, E400, RAM, Tahoe, RSX, 95, P...
## $ fraud_reported <fct> Y, Y, N, Y, N, Y, N, N, N, N, N, ...
## $ auto_year_bin <fct> old_car, old_car, old_car, new_ca...
## $ months_as_customer_bin <fct> good_customer, good_customer, goo...
## $ age_bin <fct> old_people, old_people, mature, o...
## $ incident_hour_of_the_day_bin <fct> morning, morning, morning, mornin...
Cross Validation
Split data train, validation, dan test
cek proporsi kelas target
##
## N Y
## 0.74375 0.25625
##
## N Y
## 0.7694444 0.2305556
downsample data train
##
## N Y
## 0.5 0.5
Build Model
Random Forest Model 1
Gunakan metode random forest dengan menggunakan seluruh variable untuk memprediksi fraud.
## Random Forest
##
## 328 samples
## 29 predictor
## 2 classes: 'N', 'Y'
##
## Pre-processing: scaled (137)
## Resampling: Cross-Validated (10 fold, repeated 4 times)
## Summary of sample sizes: 296, 294, 295, 294, 296, 296, ...
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 2 0.6521557 0.3041104
## 69 0.7955186 0.5912308
## 137 0.8038798 0.6079762
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 137.
##
## Call:
## randomForest(x = x, y = y, mtry = param$mtry)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 137
##
## OOB estimate of error rate: 18.6%
## Confusion matrix:
## N Y class.error
## N 134 30 0.1829268
## Y 31 133 0.1890244
Model random forest di atas menyatakan mtry terbaik adalah 137 karena mempunya nilai Accuracy paling tinggi diantara lainnya.
## Confusion Matrix and Statistics
##
## Reference
## Prediction N Y
## N 227 14
## Y 50 69
##
## Accuracy : 0.8222
## 95% CI : (0.7787, 0.8603)
## No Information Rate : 0.7694
## P-Value [Acc > NIR] : 0.008876
##
## Kappa : 0.565
##
## Mcnemar's Test P-Value : 1.214e-05
##
## Sensitivity : 0.8313
## Specificity : 0.8195
## Pos Pred Value : 0.5798
## Neg Pred Value : 0.9419
## Prevalence : 0.2306
## Detection Rate : 0.1917
## Detection Prevalence : 0.3306
## Balanced Accuracy : 0.8254
##
## 'Positive' Class : Y
##
## Confusion Matrix and Statistics
##
## Reference
## Prediction N Y
## N 123 4
## Y 23 50
##
## Accuracy : 0.865
## 95% CI : (0.8097, 0.9091)
## No Information Rate : 0.73
## P-Value [Acc > NIR] : 3.325e-06
##
## Kappa : 0.6917
##
## Mcnemar's Test P-Value : 0.000532
##
## Sensitivity : 0.9259
## Specificity : 0.8425
## Pos Pred Value : 0.6849
## Neg Pred Value : 0.9685
## Prevalence : 0.2700
## Detection Rate : 0.2500
## Detection Prevalence : 0.3650
## Balanced Accuracy : 0.8842
##
## 'Positive' Class : Y
##
hasil model random forest pertama memberikan nilai accuracy 87% untuk data test dengan nilai recall / sensitivity 92%, hasil prediksi dari model random forest dengan menggunakan data test sangat tinggi dibanding dengan data validation sehingga model RF pertama cenderung overfit
Variable Importance RF 1
## rf variable importance
##
## only 20 most important variables shown (out of 137)
##
## Overall
## incident_severityMinor Damage 100.000
## insured_hobbieschess 82.059
## insured_hobbiescross-fit 67.779
## incident_severityTotal Loss 67.282
## vehicle_claim 56.051
## property_claim 40.856
## total_claim_amount 36.101
## policy_annual_premium 33.995
## capital.loss 29.651
## injury_claim 20.203
## auto_modelCivic 19.807
## capital.gains 14.155
## witnesses 11.584
## umbrella_limit 9.808
## insured_occupationpriv-house-serv 8.143
## auto_modelMDX 8.033
## auto_model93 7.680
## incident_cityRiverwood 7.568
## auto_makeDodge 7.288
## auto_modelNeon 7.187
Dari hasil dari variable importance di atas kita akan menggunakan beberapa variable saja untuk membuat model random forest ke dua (proses tunning model) yaitu : incident_severity + insured_hobbies + vehicle_claim + property_claim + total_claim_amount + policy_annual_premium + capital.loss + injury_claim + capital.gains + witnesses + umbrella_limit
Random Forest Model 2
Gunakan beberapa variabel yang sudah di pilih untuk membuat model ke 2
## Random Forest
##
## 328 samples
## 11 predictor
## 2 classes: 'N', 'Y'
##
## Pre-processing: scaled (31)
## Resampling: Cross-Validated (10 fold, repeated 4 times)
## Summary of sample sizes: 296, 294, 295, 294, 296, 296, ...
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 2 0.7390458 0.4783115
## 16 0.7916611 0.5834694
## 31 0.7902852 0.5807517
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 16.
##
## Call:
## randomForest(x = x, y = y, mtry = param$mtry)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 16
##
## OOB estimate of error rate: 21.34%
## Confusion matrix:
## N Y class.error
## N 128 36 0.2195122
## Y 34 130 0.2073171
Model random forest di atas menyatakan mtry terbaik adalah 16 karena mempunya nilai Accuracy paling tinggi diantara lainnya.
## Confusion Matrix and Statistics
##
## Reference
## Prediction N Y
## N 216 14
## Y 61 69
##
## Accuracy : 0.7917
## 95% CI : (0.746, 0.8325)
## No Information Rate : 0.7694
## P-Value [Acc > NIR] : 0.1743
##
## Kappa : 0.51
##
## Mcnemar's Test P-Value : 1.087e-07
##
## Sensitivity : 0.8313
## Specificity : 0.7798
## Pos Pred Value : 0.5308
## Neg Pred Value : 0.9391
## Prevalence : 0.2306
## Detection Rate : 0.1917
## Detection Prevalence : 0.3611
## Balanced Accuracy : 0.8056
##
## 'Positive' Class : Y
##
## Confusion Matrix and Statistics
##
## Reference
## Prediction N Y
## N 118 3
## Y 28 51
##
## Accuracy : 0.845
## 95% CI : (0.7873, 0.8922)
## No Information Rate : 0.73
## P-Value [Acc > NIR] : 8.071e-05
##
## Kappa : 0.6569
##
## Mcnemar's Test P-Value : 1.629e-05
##
## Sensitivity : 0.9444
## Specificity : 0.8082
## Pos Pred Value : 0.6456
## Neg Pred Value : 0.9752
## Prevalence : 0.2700
## Detection Rate : 0.2550
## Detection Prevalence : 0.3950
## Balanced Accuracy : 0.8763
##
## 'Positive' Class : Y
##
hasil model random forest kedua memberikan nilai accuracy 86% untuk data test dengan nilai recall / sensitivity 94%, hasil prediksi dari model random forest dengan menggunakan data test sangat tinggi dibanding dengan data validation sehingga model RF kedua cenderung overfit
Logistic Regression 1
kita akan coba gunakan model logistic regression dalam pembuatan model prediksi fraud tersebut Variable akan kita seleksi kembali dengan menggunakan metode stepwise regression dengan metode backward
## Start: AIC=246
## fraud_reported ~ policy_annual_premium + umbrella_limit + insured_sex +
## insured_education_level + insured_occupation + insured_hobbies +
## insured_relationship + capital.gains + capital.loss + incident_type +
## collision_type + incident_severity + authorities_contacted +
## incident_city + number_of_vehicles_involved + property_damage +
## bodily_injuries + witnesses + police_report_available + total_claim_amount +
## injury_claim + property_claim + vehicle_claim + auto_make +
## auto_model + auto_year_bin + months_as_customer_bin + age_bin +
## incident_hour_of_the_day_bin
##
##
## Step: AIC=246
## fraud_reported ~ policy_annual_premium + umbrella_limit + insured_sex +
## insured_education_level + insured_occupation + insured_hobbies +
## insured_relationship + capital.gains + capital.loss + incident_type +
## collision_type + incident_severity + authorities_contacted +
## incident_city + number_of_vehicles_involved + property_damage +
## bodily_injuries + witnesses + police_report_available + total_claim_amount +
## injury_claim + property_claim + vehicle_claim + auto_model +
## auto_year_bin + months_as_customer_bin + age_bin + incident_hour_of_the_day_bin
##
##
## Step: AIC=246
## fraud_reported ~ policy_annual_premium + umbrella_limit + insured_sex +
## insured_education_level + insured_occupation + insured_hobbies +
## insured_relationship + capital.gains + capital.loss + incident_type +
## collision_type + incident_severity + authorities_contacted +
## incident_city + number_of_vehicles_involved + property_damage +
## bodily_injuries + witnesses + police_report_available + total_claim_amount +
## injury_claim + property_claim + auto_model + auto_year_bin +
## months_as_customer_bin + age_bin + incident_hour_of_the_day_bin
##
## Df Deviance AIC
## - incident_city 6 0.00 234.00
## - insured_education_level 6 0.00 234.00
## - insured_relationship 5 0.00 236.00
## - authorities_contacted 4 0.00 238.00
## - incident_hour_of_the_day_bin 3 0.00 240.00
## - incident_type 2 0.00 242.00
## - police_report_available 2 0.00 242.00
## - age_bin 2 0.00 242.00
## - months_as_customer_bin 2 0.00 242.00
## - property_damage 2 0.00 242.00
## - collision_type 2 0.00 242.00
## - policy_annual_premium 1 0.00 244.00
## - property_claim 1 0.00 244.00
## - total_claim_amount 1 0.00 244.00
## - injury_claim 1 0.00 244.00
## - capital.gains 1 0.00 244.00
## - bodily_injuries 1 0.00 244.00
## - capital.loss 1 0.00 244.00
## - number_of_vehicles_involved 1 0.00 244.00
## - auto_year_bin 1 0.00 244.00
## - umbrella_limit 1 0.00 244.00
## - insured_sex 1 0.00 244.00
## - witnesses 1 0.00 244.00
## <none> 0.00 246.00
## - auto_model 38 131.01 301.01
## - incident_severity 3 186.81 426.81
## - insured_hobbies 19 255.06 463.06
## - insured_occupation 13 2811.40 3031.40
##
## Step: AIC=234
## fraud_reported ~ policy_annual_premium + umbrella_limit + insured_sex +
## insured_education_level + insured_occupation + insured_hobbies +
## insured_relationship + capital.gains + capital.loss + incident_type +
## collision_type + incident_severity + authorities_contacted +
## number_of_vehicles_involved + property_damage + bodily_injuries +
## witnesses + police_report_available + total_claim_amount +
## injury_claim + property_claim + auto_model + auto_year_bin +
## months_as_customer_bin + age_bin + incident_hour_of_the_day_bin
##
## Df Deviance AIC
## - insured_education_level 6 0.00 222.00
## - insured_relationship 5 0.00 224.00
## - authorities_contacted 4 0.00 226.00
## - incident_hour_of_the_day_bin 3 0.00 228.00
## - police_report_available 2 0.00 230.00
## - age_bin 2 0.00 230.00
## - months_as_customer_bin 2 0.00 230.00
## - incident_type 2 0.00 230.00
## - property_damage 2 0.00 230.00
## - collision_type 2 0.00 230.00
## - capital.gains 1 0.00 232.00
## - property_claim 1 0.00 232.00
## - injury_claim 1 0.00 232.00
## - bodily_injuries 1 0.00 232.00
## - policy_annual_premium 1 0.00 232.00
## - total_claim_amount 1 0.00 232.00
## - number_of_vehicles_involved 1 0.00 232.00
## - capital.loss 1 0.00 232.00
## - insured_sex 1 0.00 232.00
## - auto_year_bin 1 0.00 232.00
## - umbrella_limit 1 0.00 232.00
## - witnesses 1 0.00 232.00
## <none> 0.00 234.00
## - auto_model 38 139.45 297.45
## - insured_occupation 13 104.44 312.44
## - incident_severity 3 192.99 420.99
## - insured_hobbies 19 259.66 455.66
##
## Step: AIC=222
## fraud_reported ~ policy_annual_premium + umbrella_limit + insured_sex +
## insured_occupation + insured_hobbies + insured_relationship +
## capital.gains + capital.loss + incident_type + collision_type +
## incident_severity + authorities_contacted + number_of_vehicles_involved +
## property_damage + bodily_injuries + witnesses + police_report_available +
## total_claim_amount + injury_claim + property_claim + auto_model +
## auto_year_bin + months_as_customer_bin + age_bin + incident_hour_of_the_day_bin
##
## Df Deviance AIC
## - insured_relationship 5 0.00 212.00
## - authorities_contacted 4 0.00 214.00
## - incident_hour_of_the_day_bin 3 0.00 216.00
## - incident_type 2 0.00 218.00
## - police_report_available 2 0.00 218.00
## - age_bin 2 0.00 218.00
## - property_damage 2 0.00 218.00
## - injury_claim 1 0.00 220.00
## - bodily_injuries 1 0.00 220.00
## - total_claim_amount 1 0.00 220.00
## - number_of_vehicles_involved 1 0.00 220.00
## - property_claim 1 0.00 220.00
## - capital.gains 1 0.00 220.00
## - policy_annual_premium 1 0.00 220.00
## - insured_sex 1 0.00 220.00
## <none> 0.00 222.00
## - auto_model 38 149.28 295.28
## - insured_occupation 13 135.93 331.93
## - incident_severity 3 205.20 421.20
## - insured_hobbies 19 266.16 450.16
## - months_as_customer_bin 2 1946.36 2164.36
## - umbrella_limit 1 2234.71 2454.71
## - auto_year_bin 1 2450.97 2670.97
## - capital.loss 1 2523.06 2743.06
## - witnesses 1 2523.06 2743.06
## - collision_type 2 2667.23 2885.23
##
## Step: AIC=212
## fraud_reported ~ policy_annual_premium + umbrella_limit + insured_sex +
## insured_occupation + insured_hobbies + capital.gains + capital.loss +
## incident_type + collision_type + incident_severity + authorities_contacted +
## number_of_vehicles_involved + property_damage + bodily_injuries +
## witnesses + police_report_available + total_claim_amount +
## injury_claim + property_claim + auto_model + auto_year_bin +
## months_as_customer_bin + age_bin + incident_hour_of_the_day_bin
##
## Df Deviance AIC
## - authorities_contacted 4 0.00 204.00
## - incident_hour_of_the_day_bin 3 0.00 206.00
## - incident_type 2 0.00 208.00
## - police_report_available 2 0.00 208.00
## - age_bin 2 0.00 208.00
## - months_as_customer_bin 2 0.00 208.00
## - bodily_injuries 1 0.00 210.00
## - total_claim_amount 1 0.00 210.00
## - injury_claim 1 0.00 210.00
## - policy_annual_premium 1 0.00 210.00
## - number_of_vehicles_involved 1 0.00 210.00
## <none> 0.00 212.00
## - auto_year_bin 1 77.26 287.26
## - auto_model 38 159.59 295.59
## - collision_type 2 96.13 304.13
## - witnesses 1 98.48 308.48
## - insured_occupation 13 140.00 326.00
## - incident_severity 3 218.68 424.68
## - insured_hobbies 19 275.22 449.22
## - capital.loss 1 1946.36 2156.36
## - property_claim 1 2162.62 2372.62
## - insured_sex 1 2378.88 2588.88
## - property_damage 2 2450.97 2658.97
## - capital.gains 1 2450.97 2660.97
## - umbrella_limit 1 2883.49 3093.49
##
## Step: AIC=204
## fraud_reported ~ policy_annual_premium + umbrella_limit + insured_sex +
## insured_occupation + insured_hobbies + capital.gains + capital.loss +
## incident_type + collision_type + incident_severity + number_of_vehicles_involved +
## property_damage + bodily_injuries + witnesses + police_report_available +
## total_claim_amount + injury_claim + property_claim + auto_model +
## auto_year_bin + months_as_customer_bin + age_bin + incident_hour_of_the_day_bin
##
## Df Deviance AIC
## - incident_hour_of_the_day_bin 3 0.00 198.00
## - police_report_available 2 0.00 200.00
## - age_bin 2 0.00 200.00
## - months_as_customer_bin 2 0.00 200.00
## - incident_type 2 0.00 200.00
## - bodily_injuries 1 0.00 202.00
## - number_of_vehicles_involved 1 0.00 202.00
## - capital.gains 1 0.00 202.00
## - policy_annual_premium 1 0.00 202.00
## - property_claim 1 0.00 202.00
## - insured_sex 1 0.00 202.00
## <none> 0.00 204.00
## - auto_model 38 168.58 296.58
## - property_damage 2 99.30 299.30
## - witnesses 1 110.56 312.56
## - umbrella_limit 1 113.76 315.76
## - auto_year_bin 1 114.95 316.95
## - collision_type 2 130.29 330.29
## - insured_occupation 13 156.07 334.07
## - incident_severity 3 235.91 433.91
## - insured_hobbies 19 279.41 445.41
## - capital.loss 1 2378.88 2580.88
## - injury_claim 1 2450.97 2652.97
## - total_claim_amount 1 2955.58 3157.58
##
## Step: AIC=198
## fraud_reported ~ policy_annual_premium + umbrella_limit + insured_sex +
## insured_occupation + insured_hobbies + capital.gains + capital.loss +
## incident_type + collision_type + incident_severity + number_of_vehicles_involved +
## property_damage + bodily_injuries + witnesses + police_report_available +
## total_claim_amount + injury_claim + property_claim + auto_model +
## auto_year_bin + months_as_customer_bin + age_bin
##
## Df Deviance AIC
## - number_of_vehicles_involved 1 0.0 196.0
## - bodily_injuries 1 0.0 196.0
## - injury_claim 1 0.0 196.0
## <none> 0.0 198.0
## - capital.loss 1 90.6 286.6
## - auto_model 38 170.6 292.6
## - property_damage 2 100.5 294.5
## - insured_sex 1 102.7 298.7
## - witnesses 1 112.1 308.1
## - umbrella_limit 1 115.0 311.0
## - auto_year_bin 1 117.6 313.6
## - collision_type 2 130.9 324.9
## - insured_occupation 13 158.7 330.7
## - incident_severity 3 245.7 437.7
## - insured_hobbies 19 280.2 440.2
## - months_as_customer_bin 2 2451.0 2645.0
## - total_claim_amount 1 2451.0 2647.0
## - incident_type 2 2523.1 2717.1
## - police_report_available 2 2523.1 2717.1
## - age_bin 2 3027.7 3221.7
## - property_claim 1 3027.7 3223.7
## - policy_annual_premium 1 3243.9 3439.9
## - capital.gains 1 20977.4 21173.4
##
## Step: AIC=196
## fraud_reported ~ policy_annual_premium + umbrella_limit + insured_sex +
## insured_occupation + insured_hobbies + capital.gains + capital.loss +
## incident_type + collision_type + incident_severity + property_damage +
## bodily_injuries + witnesses + police_report_available + total_claim_amount +
## injury_claim + property_claim + auto_model + auto_year_bin +
## months_as_customer_bin + age_bin
##
## Df Deviance AIC
## - police_report_available 2 0.00 192.00
## - bodily_injuries 1 0.00 194.00
## - injury_claim 1 0.00 194.00
## - capital.gains 1 0.00 194.00
## <none> 0.00 196.00
## - incident_type 2 90.86 282.86
## - capital.loss 1 90.83 284.83
## - auto_model 38 170.58 290.58
## - property_damage 2 100.65 292.65
## - insured_sex 1 104.25 298.25
## - witnesses 1 113.60 307.60
## - umbrella_limit 1 115.05 309.05
## - auto_year_bin 1 118.79 312.79
## - collision_type 2 132.25 324.25
## - insured_occupation 13 158.75 328.75
## - incident_severity 3 245.69 435.69
## - insured_hobbies 19 280.98 438.98
## - policy_annual_premium 1 1874.27 2068.27
## - months_as_customer_bin 2 2090.53 2282.53
## - total_claim_amount 1 2090.53 2284.53
## - age_bin 2 2378.88 2570.88
## - property_claim 1 2523.06 2717.06
##
## Step: AIC=192
## fraud_reported ~ policy_annual_premium + umbrella_limit + insured_sex +
## insured_occupation + insured_hobbies + capital.gains + capital.loss +
## incident_type + collision_type + incident_severity + property_damage +
## bodily_injuries + witnesses + total_claim_amount + injury_claim +
## property_claim + auto_model + auto_year_bin + months_as_customer_bin +
## age_bin
##
## Df Deviance AIC
## <none> 0.00 192.0
## - months_as_customer_bin 2 72.36 260.4
## - incident_type 2 92.20 280.2
## - capital.loss 1 91.08 281.1
## - auto_model 38 171.06 287.1
## - property_damage 2 101.45 289.4
## - insured_sex 1 104.26 294.3
## - witnesses 1 114.84 304.8
## - umbrella_limit 1 116.95 306.9
## - auto_year_bin 1 121.45 311.5
## - collision_type 2 133.46 321.5
## - insured_occupation 13 158.93 324.9
## - insured_hobbies 19 284.76 438.8
## - incident_severity 3 252.82 438.8
## - age_bin 2 2090.53 2278.5
## - total_claim_amount 1 2162.62 2352.6
## - policy_annual_premium 1 2306.79 2496.8
## - capital.gains 1 2450.97 2641.0
## - bodily_injuries 1 2523.06 2713.1
## - injury_claim 1 2739.32 2929.3
## - property_claim 1 3027.67 3217.7
##
## Call: glm(formula = fraud_reported ~ policy_annual_premium + umbrella_limit +
## insured_sex + insured_occupation + insured_hobbies + capital.gains +
## capital.loss + incident_type + collision_type + incident_severity +
## property_damage + bodily_injuries + witnesses + total_claim_amount +
## injury_claim + property_claim + auto_model + auto_year_bin +
## months_as_customer_bin + age_bin, family = "binomial", data = claim.train)
##
## Coefficients:
## (Intercept)
## -3.545e+03
## policy_annual_premium
## 8.513e-01
## umbrella_limit
## 3.102e-04
## insured_sexMALE
## -1.215e+03
## insured_occupationarmed-forces
## -4.544e+02
## insured_occupationcraft-repair
## 1.039e+03
## insured_occupationexec-managerial
## 3.263e+03
## insured_occupationfarming-fishing
## -1.019e+03
## insured_occupationhandlers-cleaners
## -3.145e+03
## insured_occupationmachine-op-inspct
## -1.470e+03
## insured_occupationother-service
## -1.203e+03
## insured_occupationpriv-house-serv
## -2.154e+03
## insured_occupationprof-specialty
## -1.027e+03
## insured_occupationprotective-serv
## -1.507e+03
## insured_occupationsales
## -1.460e+03
## insured_occupationtech-support
## -1.229e+03
## insured_occupationtransport-moving
## -1.148e+03
## insured_hobbiesbasketball
## -1.135e+03
## insured_hobbiesboard-games
## 1.646e+03
## insured_hobbiesbungie-jumping
## -1.809e+03
## insured_hobbiescamping
## -3.500e+02
## insured_hobbieschess
## 6.751e+03
## insured_hobbiescross-fit
## 6.475e+03
## insured_hobbiesdancing
## -2.480e+03
## insured_hobbiesexercise
## -8.326e+02
## insured_hobbiesgolf
## -5.297e+02
## insured_hobbieshiking
## 2.054e+03
## insured_hobbieskayaking
## -2.697e+03
## insured_hobbiesmovies
## 1.580e+03
## insured_hobbiespaintball
## 1.039e+03
## insured_hobbiespolo
## -2.691e+02
## insured_hobbiesreading
## 2.373e+03
## insured_hobbiesskydiving
## -3.091e+03
## insured_hobbiessleeping
## -1.835e+03
## insured_hobbiesvideo-games
## 1.813e+03
## insured_hobbiesyachting
## -5.212e+00
## capital.gains
## 2.509e-03
## capital.loss
## 1.293e-02
## incident_typeParked Car
## 1.635e+03
## incident_typeSingle Vehicle Collision
## 6.926e+02
## incident_typeVehicle Theft
## 2.184e+03
## collision_typeFront Collision
## 1.284e+03
## collision_typeRear Collision
## 2.529e+03
## collision_typeSide Collision
## NA
## incident_severityMinor Damage
## -4.299e+03
## incident_severityTotal Loss
## -3.883e+03
## incident_severityTrivial Damage
## -9.940e+03
## property_damageNO
## -7.894e+02
## property_damageYES
## 6.407e+02
## bodily_injuries
## 7.668e+01
## witnesses
## 7.268e+02
## total_claim_amount
## -5.099e-03
## injury_claim
## -2.451e-02
## property_claim
## 6.792e-02
## auto_model92x
## -1.100e+02
## auto_model93
## -4.014e+01
## auto_model95
## 4.162e+02
## auto_modelA3
## 3.904e+03
## auto_modelA5
## 9.744e+02
## auto_modelAccord
## 9.046e+02
## auto_modelC300
## -3.210e+03
## auto_modelCamry
## -5.680e+02
## auto_modelCivic
## 2.063e+03
## auto_modelCorolla
## -5.758e+02
## auto_modelCRV
## -2.243e+03
## auto_modelE400
## 2.069e+03
## auto_modelEscape
## -1.682e+03
## auto_modelF150
## 5.173e+03
## auto_modelForrestor
## 9.296e+02
## auto_modelFusion
## 3.374e+03
## auto_modelGrand Cherokee
## -8.424e+02
## auto_modelHighlander
## 9.666e+02
## auto_modelImpreza
## 1.081e+03
## auto_modelJetta
## 2.488e+02
## auto_modelLegacy
## -9.914e+01
## auto_modelM5
## 3.988e+03
## auto_modelMalibu
## 2.949e+03
## auto_modelMaxima
## 2.170e+03
## auto_modelMDX
## 1.536e+03
## auto_modelML350
## 1.123e+03
## auto_modelNeon
## -6.094e+01
## auto_modelPassat
## 3.198e+03
## auto_modelPathfinder
## 3.207e+03
## auto_modelRAM
## -2.583e+02
## auto_modelRSX
## 1.953e+03
## auto_modelSilverado
## 8.726e+02
## auto_modelTahoe
## -8.151e+02
## auto_modelTL
## 1.436e+03
## auto_modelUltima
## 1.804e+03
## auto_modelWrangler
## -1.697e+03
## auto_modelX5
## -1.518e+03
## auto_modelX6
## 2.816e+03
## auto_year_binold_car
## 2.717e+03
## months_as_customer_binloyal_customer
## -5.498e+02
## months_as_customer_binnew_customer
## 4.168e+02
## age_binold_people
## 1.970e+02
## age_binyoung_people
## -1.160e+03
##
## Degrees of Freedom: 327 Total (i.e. Null); 232 Residual
## Null Deviance: 454.7
## Residual Deviance: 9.791e-06 AIC: 192
## Generalized Linear Model
##
## 328 samples
## 20 predictor
## 2 classes: 'N', 'Y'
##
## Pre-processing: scaled (96)
## Resampling: Cross-Validated (10 fold, repeated 4 times)
## Summary of sample sizes: 296, 294, 295, 294, 296, 296, ...
## Resampling results:
##
## Accuracy Kappa
## 0.7091564 0.4185863
## Confusion Matrix and Statistics
##
## Reference
## Prediction N Y
## N 197 19
## Y 80 64
##
## Accuracy : 0.725
## 95% CI : (0.6758, 0.7705)
## No Information Rate : 0.7694
## P-Value [Acc > NIR] : 0.9789
##
## Kappa : 0.3836
##
## Mcnemar's Test P-Value : 1.637e-09
##
## Sensitivity : 0.7711
## Specificity : 0.7112
## Pos Pred Value : 0.4444
## Neg Pred Value : 0.9120
## Prevalence : 0.2306
## Detection Rate : 0.1778
## Detection Prevalence : 0.4000
## Balanced Accuracy : 0.7411
##
## 'Positive' Class : Y
##
## Confusion Matrix and Statistics
##
## Reference
## Prediction N Y
## N 116 3
## Y 30 51
##
## Accuracy : 0.835
## 95% CI : (0.7762, 0.8836)
## No Information Rate : 0.73
## P-Value [Acc > NIR] : 0.0003184
##
## Kappa : 0.6384
##
## Mcnemar's Test P-Value : 6.011e-06
##
## Sensitivity : 0.9444
## Specificity : 0.7945
## Pos Pred Value : 0.6296
## Neg Pred Value : 0.9748
## Prevalence : 0.2700
## Detection Rate : 0.2550
## Detection Prevalence : 0.4050
## Balanced Accuracy : 0.8695
##
## 'Positive' Class : Y
##
hasil model logistic regression pertama memberikan nilai accuracy 83% untuk data test dengan nilai recall / sensitivity 94%, hasil prediksi dari model logistic regression dengan menggunakan data test sangat tinggi dibanding dengan data validation sehingga model logistic regression pertama cenderung overfit
## glm variable importance
##
## only 20 most important variables shown (out of 95)
##
## Overall
## `incident_severityMinor Damage` 100.00
## `incident_severityTotal Loss` 98.74
## `insured_occupationhandlers-cleaners` 97.50
## `collision_typeRear Collision` 92.76
## `insured_occupationtransport-moving` 92.57
## insured_sexMALE 92.07
## auto_year_binold_car 91.82
## witnesses 90.40
## umbrella_limit 89.16
## `insured_occupationexec-managerial` 86.52
## `insured_occupationcraft-repair` 80.37
## `insured_occupationtech-support` 77.15
## `insured_occupationfarming-fishing` 72.28
## `insured_occupationprof-specialty` 71.87
## `collision_typeFront Collision` 71.69
## months_as_customer_binnew_customer 71.50
## capital.loss 70.00
## `incident_severityTrivial Damage` 66.05
## `incident_typeParked Car` 65.78
## property_damageYES 64.47
Logistic Regression 2
Variable pada model random forest 2 akan digunakan untuk model logistic regression ke 2
## Generalized Linear Model
##
## 328 samples
## 11 predictor
## 2 classes: 'N', 'Y'
##
## Pre-processing: scaled (31)
## Resampling: Cross-Validated (10 fold, repeated 4 times)
## Summary of sample sizes: 296, 294, 295, 294, 296, 296, ...
## Resampling results:
##
## Accuracy Kappa
## 0.8142825 0.6286741
## Confusion Matrix and Statistics
##
## Reference
## Prediction N Y
## N 234 10
## Y 43 73
##
## Accuracy : 0.8528
## 95% CI : (0.8119, 0.8877)
## No Information Rate : 0.7694
## P-Value [Acc > NIR] : 5.530e-05
##
## Kappa : 0.6358
##
## Mcnemar's Test P-Value : 1.105e-05
##
## Sensitivity : 0.8795
## Specificity : 0.8448
## Pos Pred Value : 0.6293
## Neg Pred Value : 0.9590
## Prevalence : 0.2306
## Detection Rate : 0.2028
## Detection Prevalence : 0.3222
## Balanced Accuracy : 0.8621
##
## 'Positive' Class : Y
##
## Confusion Matrix and Statistics
##
## Reference
## Prediction N Y
## N 124 6
## Y 22 48
##
## Accuracy : 0.86
## 95% CI : (0.8041, 0.9049)
## No Information Rate : 0.73
## P-Value [Acc > NIR] : 7.825e-06
##
## Kappa : 0.6752
##
## Mcnemar's Test P-Value : 0.004586
##
## Sensitivity : 0.8889
## Specificity : 0.8493
## Pos Pred Value : 0.6857
## Neg Pred Value : 0.9538
## Prevalence : 0.2700
## Detection Rate : 0.2400
## Detection Prevalence : 0.3500
## Balanced Accuracy : 0.8691
##
## 'Positive' Class : Y
##
hasil model logistic regression kedua memberikan nilai accuracy 86% untuk data test dengan nilai recall / sensitivity 88%, hasil prediksi dari model logistic regression dengan menggunakan data test tidak berbeda jauh dengan data validation sehingga model logistic regression kedua merupakan model yang fit
## glm variable importance
##
## only 20 most important variables shown (out of 30)
##
## Overall
## `incident_severityMinor Damage` 100.000
## `incident_severityTotal Loss` 85.944
## `incident_severityTrivial Damage` 55.323
## insured_hobbieschess 38.565
## insured_hobbieskayaking 30.657
## insured_hobbiesskydiving 30.103
## `insured_hobbiesbungie-jumping` 27.453
## insured_hobbiescamping 26.955
## witnesses 26.355
## insured_hobbiessleeping 25.763
## insured_hobbiesexercise 25.075
## insured_hobbiesbasketball 23.258
## property_claim 20.316
## insured_hobbiesmovies 18.447
## insured_hobbiesdancing 17.369
## total_claim_amount 17.066
## vehicle_claim 15.109
## insured_hobbiesgolf 14.815
## `insured_hobbiesvideo-games` 9.980
## insured_hobbieshiking 9.789
Logistic Regression 3 (menggunakan probability)
## Confusion Matrix and Statistics
##
## Reference
## Prediction N Y
## N 197 19
## Y 80 64
##
## Accuracy : 0.725
## 95% CI : (0.6758, 0.7705)
## No Information Rate : 0.7694
## P-Value [Acc > NIR] : 0.9789
##
## Kappa : 0.3836
##
## Mcnemar's Test P-Value : 1.637e-09
##
## Sensitivity : 0.7711
## Specificity : 0.7112
## Pos Pred Value : 0.4444
## Neg Pred Value : 0.9120
## Prevalence : 0.2306
## Detection Rate : 0.1778
## Detection Prevalence : 0.4000
## Balanced Accuracy : 0.7411
##
## 'Positive' Class : Y
##
## Confusion Matrix and Statistics
##
## Reference
## Prediction N Y
## N 116 3
## Y 30 51
##
## Accuracy : 0.835
## 95% CI : (0.7762, 0.8836)
## No Information Rate : 0.73
## P-Value [Acc > NIR] : 0.0003184
##
## Kappa : 0.6384
##
## Mcnemar's Test P-Value : 6.011e-06
##
## Sensitivity : 0.9444
## Specificity : 0.7945
## Pos Pred Value : 0.6296
## Neg Pred Value : 0.9748
## Prevalence : 0.2700
## Detection Rate : 0.2550
## Detection Prevalence : 0.4050
## Balanced Accuracy : 0.8695
##
## 'Positive' Class : Y
##
dari hasil di atas, variance trade off nya terlihat hasil model cenderung overfit.
Logistic Regression 4 (menggunakan probability)
gunakan varaible pada model random forest 2 dan logistic regression 2
## Confusion Matrix and Statistics
##
## Reference
## Prediction N Y
## N 238 14
## Y 39 69
##
## Accuracy : 0.8528
## 95% CI : (0.8119, 0.8877)
## No Information Rate : 0.7694
## P-Value [Acc > NIR] : 5.53e-05
##
## Kappa : 0.6246
##
## Mcnemar's Test P-Value : 0.0009784
##
## Sensitivity : 0.8313
## Specificity : 0.8592
## Pos Pred Value : 0.6389
## Neg Pred Value : 0.9444
## Prevalence : 0.2306
## Detection Rate : 0.1917
## Detection Prevalence : 0.3000
## Balanced Accuracy : 0.8453
##
## 'Positive' Class : Y
##
## Confusion Matrix and Statistics
##
## Reference
## Prediction N Y
## N 126 8
## Y 20 46
##
## Accuracy : 0.86
## 95% CI : (0.8041, 0.9049)
## No Information Rate : 0.73
## P-Value [Acc > NIR] : 7.825e-06
##
## Kappa : 0.6681
##
## Mcnemar's Test P-Value : 0.03764
##
## Sensitivity : 0.8519
## Specificity : 0.8630
## Pos Pred Value : 0.6970
## Neg Pred Value : 0.9403
## Prevalence : 0.2700
## Detection Rate : 0.2300
## Detection Prevalence : 0.3300
## Balanced Accuracy : 0.8574
##
## 'Positive' Class : Y
##
hasil model logistic regression ke empat memberikan nilai accuracy 86% untuk data test dengan nilai recall / sensitivity 85%, hasil prediksi dari model logistic regression dengan menggunakan data test tidak berbeda jauh dengan data validation sehingga model logistic regression ke empat merupakan model yang fit
variable yang cocok untuk melakukan klasifikasi customer yang terindikasi fraud adalah
incident_severity insured_hobbies vehicle_claim property_claim total_claim_amount policy_annual_premium capital.loss injury_claim capital.gains witnesses umbrella_limit