A car insurance claim classifier is a business analytics tool designed to analyze and categorize car insurance claims based on various parameters. It utilizes advanced machine learning techniques to automatically process and understand the contents of claim documents, such as accident reports, repair estimates, and customer statements. The primary use case for a car insurance claim classifier is to streamline and optimize the claims management process.
## 'data.frame': 10000 obs. of 19 variables:
## $ ID : int 569520 750365 199901 478866 731664 877557 930134 461006 68366 445911 ...
## $ AGE : Factor w/ 4 levels "16-25","26-39",..: 4 1 1 1 2 3 4 2 3 3 ...
## $ GENDER : Factor w/ 2 levels "female","male": 1 2 1 2 2 1 2 1 1 1 ...
## $ RACE : Factor w/ 2 levels "majority","minority": 1 1 1 1 1 1 1 1 1 1 ...
## $ DRIVING_EXPERIENCE : Factor w/ 4 levels "0-9y","10-19y",..: 1 1 1 1 2 3 4 1 3 1 ...
## $ EDUCATION : Factor w/ 3 levels "high school",..: 1 2 1 3 2 1 1 3 3 1 ...
## $ INCOME : Factor w/ 4 levels "middle class",..: 3 2 4 4 4 3 3 4 4 3 ...
## $ CREDIT_SCORE : num 0.629 0.358 0.493 0.206 0.388 ...
## $ VEHICLE_OWNERSHIP : num 1 0 1 1 1 1 0 0 0 1 ...
## $ VEHICLE_YEAR : Factor w/ 2 levels "after 2015","before 2015": 1 2 2 2 2 1 1 1 2 2 ...
## $ MARRIED : num 0 0 0 0 0 0 1 0 1 0 ...
## $ CHILDREN : num 1 0 0 1 0 1 1 1 0 1 ...
## $ POSTAL_CODE : int 10238 10238 10238 32765 32765 10238 10238 10238 10238 32765 ...
## $ ANNUAL_MILEAGE : num 12000 16000 11000 11000 12000 13000 13000 14000 13000 11000 ...
## $ VEHICLE_TYPE : Factor w/ 2 levels "sedan","sports car": 1 1 1 1 1 1 1 1 1 1 ...
## $ SPEEDING_VIOLATIONS: int 0 0 0 0 2 3 7 0 0 0 ...
## $ DUIS : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PAST_ACCIDENTS : int 0 0 0 0 1 3 3 0 0 0 ...
## $ OUTCOME : num 0 1 0 0 1 0 0 1 0 1 ...
Numeric Predictor: * CREDIT_SCORE ,
VEHICLE_OWNERSHIP , MARRIED ,
CHILDREN , POSTAL_CODE ,
ANNUAL_MILEAGE , SPEEDING_VIOLATION ,
DUIS , PAST_ACCIDENTS
Categorical Predictor: * AGE , GENDER ,
RACE , DRIVING_EXPERIENCE ,
EDUCATION , INCOME , VEHICLE_YEAR
, VEHICLE_TYPE
inappropriate columns datatype: - OUTCOME should be changed into
factor because the value is 0 and 1 (representing no or
yes) - MARRIED should be changed into factor because the
value is 0 and 1 (representing no or yes) -
POSTAL_CODE should be changed into factor because the value is repeating
and only has 4 unique values
in this process of creating the classification models, we might think
that we do not need ID columns because it does not have any
relation
## AGE GENDER RACE DRIVING_EXPERIENCE
## 0 0 0 0
## EDUCATION INCOME CREDIT_SCORE VEHICLE_OWNERSHIP
## 0 0 982 0
## VEHICLE_YEAR MARRIED CHILDREN POSTAL_CODE
## 0 0 0 0
## ANNUAL_MILEAGE VEHICLE_TYPE SPEEDING_VIOLATIONS DUIS
## 957 0 0 0
## PAST_ACCIDENTS OUTCOME
## 0 0
# DROPPING ALL ROWS THAT HAS N/A VALUES
car_claim_clean <- drop_na(car_claim)
# CHECKING THE DUPLICATE VALUES
sum(duplicated(car_claim_clean))## [1] 0
💡 Insight:
as we can see from the code above, in this dataframe
there are:
1. 982 missing values on columns
CREDIT_SCORE out of 10.000 data
2. 957 missing
values on columns ANNUAL_MILEAGE out of 10.000 data
3. For this occassion we would like to drop all the rows with N/A
4. After we drop all the rows with N/A, we check on duplicate values and
we found no duplicate values
Our goal is to create a model which can distinguished between
customer who would likely to make a claim based on several parameters.
So, our target would be column OUTCOME and the rest of
columns are predictors.
We split the dataset into
data_train and data_test with proportion
80:20
## Warning in RNGkind(sample.kind = "Rounding"): non-uniform 'Rounding' sampler
## used
For Base Model, we would like to create model that includes all the predictor
model_claim <- glm(formula = OUTCOME ~ . ,
data = data_train,
family = "binomial")
summary(model_claim)##
## Call:
## glm(formula = OUTCOME ~ ., family = "binomial", data = data_train)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.13948519 0.39422826 -5.427 0.000000057302257
## AGE26-39 -0.03632486 0.13305306 -0.273 0.784845
## AGE40-64 -0.17728324 0.15748537 -1.126 0.260287
## AGE65+ -0.09566942 0.19535964 -0.490 0.624340
## GENDERmale 1.01180583 0.08562021 11.817 < 0.0000000000000002
## RACEminority -0.06882417 0.12410914 -0.555 0.579206
## DRIVING_EXPERIENCE10-19y -2.13157396 0.13302536 -16.024 < 0.0000000000000002
## DRIVING_EXPERIENCE20-29y -4.13218544 0.24116869 -17.134 < 0.0000000000000002
## DRIVING_EXPERIENCE30y+ -5.45895788 0.51745200 -10.550 < 0.0000000000000002
## EDUCATIONnone 0.07812510 0.10885028 0.718 0.472924
## EDUCATIONuniversity -0.00028646 0.09919979 -0.003 0.997696
## INCOMEpoverty -0.08123366 0.15664331 -0.519 0.604047
## INCOMEupper class 0.02611490 0.13276479 0.197 0.844062
## INCOMEworking class -0.03926957 0.12748790 -0.308 0.758063
## CREDIT_SCORE 0.35892098 0.43391080 0.827 0.408137
## VEHICLE_OWNERSHIP -2.07017946 0.09178648 -22.554 < 0.0000000000000002
## VEHICLE_YEARbefore 2015 2.08473345 0.11250188 18.531 < 0.0000000000000002
## MARRIED1 -0.31365822 0.09386823 -3.341 0.000833
## CHILDREN -0.09340746 0.09403745 -0.993 0.320563
## POSTAL_CODE21217 21.37087977 177.46346479 0.120 0.904147
## POSTAL_CODE32765 1.30336770 0.10774385 12.097 < 0.0000000000000002
## POSTAL_CODE92101 1.38611233 0.18829123 7.362 0.000000000000182
## ANNUAL_MILEAGE 0.00013849 0.00001891 7.323 0.000000000000242
## VEHICLE_TYPEsports car -0.02118414 0.18218206 -0.116 0.907431
## SPEEDING_VIOLATIONS 0.04225682 0.03458996 1.222 0.221840
## DUIS 0.14809080 0.10128250 1.462 0.143699
## PAST_ACCIDENTS -0.06591079 0.04921493 -1.339 0.180491
##
## (Intercept) ***
## AGE26-39
## AGE40-64
## AGE65+
## GENDERmale ***
## RACEminority
## DRIVING_EXPERIENCE10-19y ***
## DRIVING_EXPERIENCE20-29y ***
## DRIVING_EXPERIENCE30y+ ***
## EDUCATIONnone
## EDUCATIONuniversity
## INCOMEpoverty
## INCOMEupper class
## INCOMEworking class
## CREDIT_SCORE
## VEHICLE_OWNERSHIP ***
## VEHICLE_YEARbefore 2015 ***
## MARRIED1 ***
## CHILDREN
## POSTAL_CODE21217
## POSTAL_CODE32765 ***
## POSTAL_CODE92101 ***
## ANNUAL_MILEAGE ***
## VEHICLE_TYPEsports car
## SPEEDING_VIOLATIONS
## DUIS
## PAST_ACCIDENTS
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 8147.2 on 6518 degrees of freedom
## Residual deviance: 4148.2 on 6492 degrees of freedom
## AIC: 4202.2
##
## Number of Fisher Scoring iterations: 15
💡 Insight:
GENDERmale , DRIVING_EXPERIENCE10-19y ,
DRIVING_EXPERIENCE20-29 ,
DRIVING_EXPERIENCE30y+ , VEHICLE_OWNERSHIP ,
VEHICLE_YEARbefore 2015 , MARRIED1 ,
POSTAL_CODE32765 , POSTAL_CODE92101 and
ANNUAL_MILEAGE are the significant predictormodel_claim_null <- glm(formula = OUTCOME ~ 1 ,
data = data_train,
family = "binomial")
summary(model_claim_null)##
## Call:
## glm(formula = OUTCOME ~ 1, family = "binomial", data = data_train)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.76584 0.02661 -28.78 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 8147.2 on 6518 degrees of freedom
## Residual deviance: 8147.2 on 6518 degrees of freedom
## AIC: 8149.2
##
## Number of Fisher Scoring iterations: 4
Model Backward
## Start: AIC=4202.23
## OUTCOME ~ AGE + GENDER + RACE + DRIVING_EXPERIENCE + EDUCATION +
## INCOME + CREDIT_SCORE + VEHICLE_OWNERSHIP + VEHICLE_YEAR +
## MARRIED + CHILDREN + POSTAL_CODE + ANNUAL_MILEAGE + VEHICLE_TYPE +
## SPEEDING_VIOLATIONS + DUIS + PAST_ACCIDENTS
##
## Df Deviance AIC
## - INCOME 3 4148.5 4196.5
## - AGE 3 4149.8 4197.8
## - EDUCATION 2 4148.8 4198.8
## - VEHICLE_TYPE 1 4148.2 4200.2
## - RACE 1 4148.5 4200.5
## - CREDIT_SCORE 1 4148.9 4200.9
## - CHILDREN 1 4149.2 4201.2
## - SPEEDING_VIOLATIONS 1 4149.7 4201.7
## - PAST_ACCIDENTS 1 4150.0 4202.0
## <none> 4148.2 4202.2
## - DUIS 1 4150.3 4202.3
## - MARRIED 1 4159.4 4211.4
## - ANNUAL_MILEAGE 1 4203.0 4255.0
## - GENDER 1 4294.4 4346.4
## - VEHICLE_YEAR 1 4568.2 4620.2
## - DRIVING_EXPERIENCE 3 4618.3 4666.3
## - VEHICLE_OWNERSHIP 1 4731.9 4783.9
## - POSTAL_CODE 3 4791.9 4839.9
##
## Step: AIC=4196.54
## OUTCOME ~ AGE + GENDER + RACE + DRIVING_EXPERIENCE + EDUCATION +
## CREDIT_SCORE + VEHICLE_OWNERSHIP + VEHICLE_YEAR + MARRIED +
## CHILDREN + POSTAL_CODE + ANNUAL_MILEAGE + VEHICLE_TYPE +
## SPEEDING_VIOLATIONS + DUIS + PAST_ACCIDENTS
##
## Df Deviance AIC
## - AGE 3 4149.9 4191.9
## - EDUCATION 2 4148.9 4192.9
## - VEHICLE_TYPE 1 4148.6 4194.6
## - RACE 1 4148.8 4194.8
## - CHILDREN 1 4149.5 4195.5
## - SPEEDING_VIOLATIONS 1 4150.0 4196.0
## - PAST_ACCIDENTS 1 4150.3 4196.3
## - CREDIT_SCORE 1 4150.5 4196.5
## <none> 4148.5 4196.5
## - DUIS 1 4150.7 4196.7
## - MARRIED 1 4159.4 4205.4
## - ANNUAL_MILEAGE 1 4203.2 4249.2
## - GENDER 1 4296.8 4342.8
## - VEHICLE_YEAR 1 4580.1 4626.1
## - DRIVING_EXPERIENCE 3 4620.7 4662.7
## - VEHICLE_OWNERSHIP 1 4757.4 4803.4
## - POSTAL_CODE 3 4792.6 4834.6
##
## Step: AIC=4191.88
## OUTCOME ~ GENDER + RACE + DRIVING_EXPERIENCE + EDUCATION + CREDIT_SCORE +
## VEHICLE_OWNERSHIP + VEHICLE_YEAR + MARRIED + CHILDREN + POSTAL_CODE +
## ANNUAL_MILEAGE + VEHICLE_TYPE + SPEEDING_VIOLATIONS + DUIS +
## PAST_ACCIDENTS
##
## Df Deviance AIC
## - EDUCATION 2 4150.2 4188.2
## - VEHICLE_TYPE 1 4149.9 4189.9
## - RACE 1 4150.2 4190.2
## - CHILDREN 1 4151.3 4191.3
## - CREDIT_SCORE 1 4151.3 4191.3
## - SPEEDING_VIOLATIONS 1 4151.5 4191.5
## - PAST_ACCIDENTS 1 4151.6 4191.6
## <none> 4149.9 4191.9
## - DUIS 1 4152.0 4192.0
## - MARRIED 1 4161.8 4201.8
## - ANNUAL_MILEAGE 1 4205.1 4245.1
## - GENDER 1 4297.7 4337.7
## - VEHICLE_YEAR 1 4588.2 4628.2
## - VEHICLE_OWNERSHIP 1 4762.8 4802.8
## - DRIVING_EXPERIENCE 3 4769.8 4805.8
## - POSTAL_CODE 3 4795.4 4831.4
##
## Step: AIC=4188.18
## OUTCOME ~ GENDER + RACE + DRIVING_EXPERIENCE + CREDIT_SCORE +
## VEHICLE_OWNERSHIP + VEHICLE_YEAR + MARRIED + CHILDREN + POSTAL_CODE +
## ANNUAL_MILEAGE + VEHICLE_TYPE + SPEEDING_VIOLATIONS + DUIS +
## PAST_ACCIDENTS
##
## Df Deviance AIC
## - VEHICLE_TYPE 1 4150.2 4186.2
## - RACE 1 4150.5 4186.5
## - CREDIT_SCORE 1 4151.5 4187.5
## - CHILDREN 1 4151.5 4187.5
## - SPEEDING_VIOLATIONS 1 4151.8 4187.8
## - PAST_ACCIDENTS 1 4151.9 4187.9
## <none> 4150.2 4188.2
## - DUIS 1 4152.4 4188.4
## - MARRIED 1 4162.2 4198.2
## - ANNUAL_MILEAGE 1 4205.4 4241.4
## - GENDER 1 4300.4 4336.4
## - VEHICLE_YEAR 1 4593.5 4629.5
## - DRIVING_EXPERIENCE 3 4770.1 4802.1
## - VEHICLE_OWNERSHIP 1 4772.1 4808.1
## - POSTAL_CODE 3 4795.6 4827.6
##
## Step: AIC=4186.21
## OUTCOME ~ GENDER + RACE + DRIVING_EXPERIENCE + CREDIT_SCORE +
## VEHICLE_OWNERSHIP + VEHICLE_YEAR + MARRIED + CHILDREN + POSTAL_CODE +
## ANNUAL_MILEAGE + SPEEDING_VIOLATIONS + DUIS + PAST_ACCIDENTS
##
## Df Deviance AIC
## - RACE 1 4150.5 4184.5
## - CREDIT_SCORE 1 4151.5 4185.5
## - CHILDREN 1 4151.6 4185.6
## - SPEEDING_VIOLATIONS 1 4151.8 4185.8
## - PAST_ACCIDENTS 1 4151.9 4185.9
## <none> 4150.2 4186.2
## - DUIS 1 4152.4 4186.4
## - MARRIED 1 4162.3 4196.3
## - ANNUAL_MILEAGE 1 4205.4 4239.4
## - GENDER 1 4300.4 4334.4
## - VEHICLE_YEAR 1 4593.7 4627.7
## - DRIVING_EXPERIENCE 3 4770.3 4800.3
## - VEHICLE_OWNERSHIP 1 4772.4 4806.4
## - POSTAL_CODE 3 4795.7 4825.7
##
## Step: AIC=4184.51
## OUTCOME ~ GENDER + DRIVING_EXPERIENCE + CREDIT_SCORE + VEHICLE_OWNERSHIP +
## VEHICLE_YEAR + MARRIED + CHILDREN + POSTAL_CODE + ANNUAL_MILEAGE +
## SPEEDING_VIOLATIONS + DUIS + PAST_ACCIDENTS
##
## Df Deviance AIC
## - CREDIT_SCORE 1 4151.9 4183.9
## - CHILDREN 1 4151.9 4183.9
## - SPEEDING_VIOLATIONS 1 4152.1 4184.1
## - PAST_ACCIDENTS 1 4152.3 4184.3
## <none> 4150.5 4184.5
## - DUIS 1 4152.7 4184.7
## - MARRIED 1 4162.5 4194.5
## - ANNUAL_MILEAGE 1 4205.7 4237.7
## - GENDER 1 4301.4 4333.4
## - VEHICLE_YEAR 1 4593.8 4625.8
## - DRIVING_EXPERIENCE 3 4770.3 4798.3
## - VEHICLE_OWNERSHIP 1 4772.4 4804.4
## - POSTAL_CODE 3 4796.7 4824.7
##
## Step: AIC=4183.86
## OUTCOME ~ GENDER + DRIVING_EXPERIENCE + VEHICLE_OWNERSHIP + VEHICLE_YEAR +
## MARRIED + CHILDREN + POSTAL_CODE + ANNUAL_MILEAGE + SPEEDING_VIOLATIONS +
## DUIS + PAST_ACCIDENTS
##
## Df Deviance AIC
## - CHILDREN 1 4153.0 4183.0
## - SPEEDING_VIOLATIONS 1 4153.4 4183.4
## - PAST_ACCIDENTS 1 4153.6 4183.6
## <none> 4151.9 4183.9
## - DUIS 1 4154.0 4184.0
## - MARRIED 1 4162.8 4192.8
## - ANNUAL_MILEAGE 1 4206.8 4236.8
## - GENDER 1 4301.5 4331.5
## - VEHICLE_YEAR 1 4608.1 4638.1
## - DRIVING_EXPERIENCE 3 4776.8 4802.8
## - POSTAL_CODE 3 4797.7 4823.7
## - VEHICLE_OWNERSHIP 1 4802.0 4832.0
##
## Step: AIC=4183.03
## OUTCOME ~ GENDER + DRIVING_EXPERIENCE + VEHICLE_OWNERSHIP + VEHICLE_YEAR +
## MARRIED + POSTAL_CODE + ANNUAL_MILEAGE + SPEEDING_VIOLATIONS +
## DUIS + PAST_ACCIDENTS
##
## Df Deviance AIC
## - SPEEDING_VIOLATIONS 1 4154.6 4182.6
## - PAST_ACCIDENTS 1 4154.7 4182.7
## <none> 4153.0 4183.0
## - DUIS 1 4155.2 4183.2
## - MARRIED 1 4164.4 4192.4
## - ANNUAL_MILEAGE 1 4227.8 4255.8
## - GENDER 1 4302.6 4330.6
## - VEHICLE_YEAR 1 4611.1 4639.1
## - DRIVING_EXPERIENCE 3 4795.7 4819.7
## - POSTAL_CODE 3 4803.3 4827.3
## - VEHICLE_OWNERSHIP 1 4804.0 4832.0
##
## Step: AIC=4182.62
## OUTCOME ~ GENDER + DRIVING_EXPERIENCE + VEHICLE_OWNERSHIP + VEHICLE_YEAR +
## MARRIED + POSTAL_CODE + ANNUAL_MILEAGE + DUIS + PAST_ACCIDENTS
##
## Df Deviance AIC
## - PAST_ACCIDENTS 1 4156.2 4182.2
## <none> 4154.6 4182.6
## - DUIS 1 4157.2 4183.2
## - MARRIED 1 4166.1 4192.1
## - ANNUAL_MILEAGE 1 4227.8 4253.8
## - GENDER 1 4314.5 4340.5
## - VEHICLE_YEAR 1 4612.3 4638.3
## - VEHICLE_OWNERSHIP 1 4806.2 4832.2
## - POSTAL_CODE 3 4813.6 4835.6
## - DRIVING_EXPERIENCE 3 5010.4 5032.4
##
## Step: AIC=4182.18
## OUTCOME ~ GENDER + DRIVING_EXPERIENCE + VEHICLE_OWNERSHIP + VEHICLE_YEAR +
## MARRIED + POSTAL_CODE + ANNUAL_MILEAGE + DUIS
##
## Df Deviance AIC
## <none> 4156.2 4182.2
## - DUIS 1 4158.9 4182.9
## - MARRIED 1 4167.6 4191.6
## - ANNUAL_MILEAGE 1 4233.7 4257.7
## - GENDER 1 4316.8 4340.8
## - VEHICLE_YEAR 1 4613.7 4637.7
## - VEHICLE_OWNERSHIP 1 4806.4 4830.4
## - POSTAL_CODE 3 4843.0 4863.0
## - DRIVING_EXPERIENCE 3 5592.6 5612.6
##
## Call:
## glm(formula = OUTCOME ~ GENDER + DRIVING_EXPERIENCE + VEHICLE_OWNERSHIP +
## VEHICLE_YEAR + MARRIED + POSTAL_CODE + ANNUAL_MILEAGE + DUIS,
## family = "binomial", data = data_train)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.20441920 0.26226432 -8.405 < 0.0000000000000002
## GENDERmale 0.99940299 0.08101164 12.337 < 0.0000000000000002
## DRIVING_EXPERIENCE10-19y -2.14752235 0.09077430 -23.658 < 0.0000000000000002
## DRIVING_EXPERIENCE20-29y -4.22403836 0.17692269 -23.875 < 0.0000000000000002
## DRIVING_EXPERIENCE30y+ -5.39223245 0.43910017 -12.280 < 0.0000000000000002
## VEHICLE_OWNERSHIP -2.03681243 0.08606314 -23.666 < 0.0000000000000002
## VEHICLE_YEARbefore 2015 2.06009553 0.10758292 19.149 < 0.0000000000000002
## MARRIED1 -0.30577182 0.09037675 -3.383 0.000716
## POSTAL_CODE21217 21.37003938 178.13671900 0.120 0.904511
## POSTAL_CODE32765 1.37773592 0.10126934 13.605 < 0.0000000000000002
## POSTAL_CODE92101 1.41774008 0.18702383 7.581 0.0000000000000344
## ANNUAL_MILEAGE 0.00014677 0.00001691 8.679 < 0.0000000000000002
## DUIS 0.16724977 0.10021863 1.669 0.095147
##
## (Intercept) ***
## GENDERmale ***
## DRIVING_EXPERIENCE10-19y ***
## DRIVING_EXPERIENCE20-29y ***
## DRIVING_EXPERIENCE30y+ ***
## VEHICLE_OWNERSHIP ***
## VEHICLE_YEARbefore 2015 ***
## MARRIED1 ***
## POSTAL_CODE21217
## POSTAL_CODE32765 ***
## POSTAL_CODE92101 ***
## ANNUAL_MILEAGE ***
## DUIS .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 8147.2 on 6518 degrees of freedom
## Residual deviance: 4156.2 on 6506 degrees of freedom
## AIC: 4182.2
##
## Number of Fisher Scoring iterations: 15
## (Intercept) GENDERmale DRIVING_EXPERIENCE10-19y
## 0.11031458 2.71665946 0.11677312
## DRIVING_EXPERIENCE20-29y DRIVING_EXPERIENCE30y+ VEHICLE_OWNERSHIP
## 0.01463941 0.00455180 0.13044385
## VEHICLE_YEARbefore 2015 MARRIED1 POSTAL_CODE21217
## 7.84671936 0.73655467 1909370388.08310127
## POSTAL_CODE32765 POSTAL_CODE92101 ANNUAL_MILEAGE
## 3.96591233 4.12778145 1.00014678
## DUIS
## 1.18204946
Model Forward
model_forward <- step(object = model_claim_null,
direction = "forward",
scope = list(lower = model_claim_null, upper = model_claim))## Start: AIC=8149.17
## OUTCOME ~ 1
##
## Df Deviance AIC
## + DRIVING_EXPERIENCE 3 6143.3 6151.3
## + AGE 3 6661.3 6669.3
## + INCOME 3 7041.3 7049.3
## + PAST_ACCIDENTS 1 7178.2 7182.2
## + VEHICLE_OWNERSHIP 1 7182.6 7186.6
## + SPEEDING_VIOLATIONS 1 7321.7 7325.7
## + CREDIT_SCORE 1 7475.4 7479.4
## + VEHICLE_YEAR 1 7490.7 7494.7
## + MARRIED 1 7711.1 7715.1
## + CHILDREN 1 7799.2 7803.2
## + DUIS 1 7848.8 7852.8
## + POSTAL_CODE 3 7855.5 7863.5
## + ANNUAL_MILEAGE 1 7904.9 7908.9
## + EDUCATION 2 7928.7 7934.7
## + GENDER 1 8084.9 8088.9
## + RACE 1 8144.8 8148.8
## <none> 8147.2 8149.2
## + VEHICLE_TYPE 1 8147.1 8151.1
##
## Step: AIC=6151.32
## OUTCOME ~ DRIVING_EXPERIENCE
##
## Df Deviance AIC
## + VEHICLE_OWNERSHIP 1 5457.7 5467.7
## + VEHICLE_YEAR 1 5638.8 5648.8
## + POSTAL_CODE 3 5694.3 5708.3
## + INCOME 3 5830.4 5844.4
## + CREDIT_SCORE 1 5950.2 5960.2
## + MARRIED 1 5997.4 6007.4
## + AGE 3 5993.5 6007.5
## + GENDER 1 6046.3 6056.3
## + ANNUAL_MILEAGE 1 6053.4 6063.4
## + EDUCATION 2 6057.2 6069.2
## + CHILDREN 1 6076.0 6086.0
## + SPEEDING_VIOLATIONS 1 6134.7 6144.7
## + PAST_ACCIDENTS 1 6135.2 6145.2
## <none> 6143.3 6151.3
## + DUIS 1 6141.8 6151.8
## + VEHICLE_TYPE 1 6143.2 6153.2
## + RACE 1 6143.3 6153.3
##
## Step: AIC=5467.74
## OUTCOME ~ DRIVING_EXPERIENCE + VEHICLE_OWNERSHIP
##
## Df Deviance AIC
## + POSTAL_CODE 3 4938.3 4954.3
## + VEHICLE_YEAR 1 5046.9 5058.9
## + GENDER 1 5343.0 5355.0
## + MARRIED 1 5376.3 5388.3
## + INCOME 3 5382.1 5398.1
## + ANNUAL_MILEAGE 1 5394.7 5406.7
## + AGE 3 5397.4 5413.4
## + CREDIT_SCORE 1 5410.4 5422.4
## + CHILDREN 1 5411.4 5423.4
## + PAST_ACCIDENTS 1 5446.1 5458.1
## + SPEEDING_VIOLATIONS 1 5446.9 5458.9
## + EDUCATION 2 5445.0 5459.0
## + DUIS 1 5454.0 5466.0
## <none> 5457.7 5467.7
## + RACE 1 5457.3 5469.3
## + VEHICLE_TYPE 1 5457.7 5469.7
##
## Step: AIC=4954.26
## OUTCOME ~ DRIVING_EXPERIENCE + VEHICLE_OWNERSHIP + POSTAL_CODE
##
## Df Deviance AIC
## + VEHICLE_YEAR 1 4463.7 4481.7
## + ANNUAL_MILEAGE 1 4789.0 4807.0
## + GENDER 1 4799.9 4817.9
## + MARRIED 1 4843.0 4861.0
## + INCOME 3 4851.0 4873.0
## + AGE 3 4873.9 4895.9
## + CHILDREN 1 4880.9 4898.9
## + CREDIT_SCORE 1 4883.1 4901.1
## + EDUCATION 2 4922.8 4942.8
## + DUIS 1 4931.4 4949.4
## + SPEEDING_VIOLATIONS 1 4935.3 4953.3
## <none> 4938.3 4954.3
## + PAST_ACCIDENTS 1 4938.0 4956.0
## + RACE 1 4938.2 4956.2
## + VEHICLE_TYPE 1 4938.2 4956.2
##
## Step: AIC=4481.68
## OUTCOME ~ DRIVING_EXPERIENCE + VEHICLE_OWNERSHIP + POSTAL_CODE +
## VEHICLE_YEAR
##
## Df Deviance AIC
## + GENDER 1 4307.9 4327.9
## + ANNUAL_MILEAGE 1 4332.0 4352.0
## + MARRIED 1 4400.5 4420.5
## + CHILDREN 1 4420.1 4440.1
## + AGE 3 4447.9 4471.9
## + DUIS 1 4457.7 4477.7
## + CREDIT_SCORE 1 4458.5 4478.5
## + SPEEDING_VIOLATIONS 1 4459.0 4479.0
## <none> 4463.7 4481.7
## + INCOME 3 4457.7 4481.7
## + RACE 1 4463.0 4483.0
## + VEHICLE_TYPE 1 4463.6 4483.6
## + PAST_ACCIDENTS 1 4463.6 4483.6
## + EDUCATION 2 4463.2 4485.2
##
## Step: AIC=4327.89
## OUTCOME ~ DRIVING_EXPERIENCE + VEHICLE_OWNERSHIP + POSTAL_CODE +
## VEHICLE_YEAR + GENDER
##
## Df Deviance AIC
## + ANNUAL_MILEAGE 1 4170.1 4192.1
## + MARRIED 1 4236.5 4258.5
## + CHILDREN 1 4263.2 4285.2
## + AGE 3 4290.5 4316.5
## + PAST_ACCIDENTS 1 4298.6 4320.6
## + INCOME 3 4299.0 4325.0
## + DUIS 1 4305.7 4327.7
## <none> 4307.9 4327.9
## + CREDIT_SCORE 1 4306.3 4328.3
## + RACE 1 4307.6 4329.6
## + VEHICLE_TYPE 1 4307.7 4329.7
## + SPEEDING_VIOLATIONS 1 4307.7 4329.7
## + EDUCATION 2 4306.2 4330.2
##
## Step: AIC=4192.06
## OUTCOME ~ DRIVING_EXPERIENCE + VEHICLE_OWNERSHIP + POSTAL_CODE +
## VEHICLE_YEAR + GENDER + ANNUAL_MILEAGE
##
## Df Deviance AIC
## + MARRIED 1 4158.9 4182.9
## + DUIS 1 4167.6 4191.6
## <none> 4170.1 4192.1
## + SPEEDING_VIOLATIONS 1 4168.2 4192.2
## + PAST_ACCIDENTS 1 4168.4 4192.4
## + CHILDREN 1 4168.5 4192.5
## + RACE 1 4169.8 4193.8
## + CREDIT_SCORE 1 4169.8 4193.8
## + VEHICLE_TYPE 1 4170.0 4194.0
## + AGE 3 4167.8 4195.8
## + EDUCATION 2 4169.9 4195.9
## + INCOME 3 4170.0 4198.0
##
## Step: AIC=4182.88
## OUTCOME ~ DRIVING_EXPERIENCE + VEHICLE_OWNERSHIP + POSTAL_CODE +
## VEHICLE_YEAR + GENDER + ANNUAL_MILEAGE + MARRIED
##
## Df Deviance AIC
## + DUIS 1 4156.2 4182.2
## <none> 4158.9 4182.9
## + SPEEDING_VIOLATIONS 1 4157.0 4183.0
## + PAST_ACCIDENTS 1 4157.2 4183.2
## + CREDIT_SCORE 1 4157.7 4183.7
## + CHILDREN 1 4157.7 4183.7
## + RACE 1 4158.5 4184.5
## + VEHICLE_TYPE 1 4158.9 4184.9
## + EDUCATION 2 4158.7 4186.7
## + AGE 3 4157.8 4187.8
## + INCOME 3 4158.3 4188.3
##
## Step: AIC=4182.18
## OUTCOME ~ DRIVING_EXPERIENCE + VEHICLE_OWNERSHIP + POSTAL_CODE +
## VEHICLE_YEAR + GENDER + ANNUAL_MILEAGE + MARRIED + DUIS
##
## Df Deviance AIC
## <none> 4156.2 4182.2
## + PAST_ACCIDENTS 1 4154.6 4182.6
## + SPEEDING_VIOLATIONS 1 4154.7 4182.7
## + CREDIT_SCORE 1 4155.0 4183.0
## + CHILDREN 1 4155.1 4183.1
## + RACE 1 4155.8 4183.8
## + VEHICLE_TYPE 1 4156.2 4184.2
## + EDUCATION 2 4156.0 4186.0
## + AGE 3 4155.2 4187.2
## + INCOME 3 4155.6 4187.6
##
## Call:
## glm(formula = OUTCOME ~ DRIVING_EXPERIENCE + VEHICLE_OWNERSHIP +
## POSTAL_CODE + VEHICLE_YEAR + GENDER + ANNUAL_MILEAGE + MARRIED +
## DUIS, family = "binomial", data = data_train)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.20441920 0.26226432 -8.405 < 0.0000000000000002
## DRIVING_EXPERIENCE10-19y -2.14752235 0.09077430 -23.658 < 0.0000000000000002
## DRIVING_EXPERIENCE20-29y -4.22403836 0.17692269 -23.875 < 0.0000000000000002
## DRIVING_EXPERIENCE30y+ -5.39223245 0.43910017 -12.280 < 0.0000000000000002
## VEHICLE_OWNERSHIP -2.03681243 0.08606314 -23.666 < 0.0000000000000002
## POSTAL_CODE21217 21.37003938 178.13671900 0.120 0.904511
## POSTAL_CODE32765 1.37773592 0.10126934 13.605 < 0.0000000000000002
## POSTAL_CODE92101 1.41774008 0.18702383 7.581 0.0000000000000344
## VEHICLE_YEARbefore 2015 2.06009553 0.10758292 19.149 < 0.0000000000000002
## GENDERmale 0.99940299 0.08101164 12.337 < 0.0000000000000002
## ANNUAL_MILEAGE 0.00014677 0.00001691 8.679 < 0.0000000000000002
## MARRIED1 -0.30577182 0.09037675 -3.383 0.000716
## DUIS 0.16724977 0.10021863 1.669 0.095147
##
## (Intercept) ***
## DRIVING_EXPERIENCE10-19y ***
## DRIVING_EXPERIENCE20-29y ***
## DRIVING_EXPERIENCE30y+ ***
## VEHICLE_OWNERSHIP ***
## POSTAL_CODE21217
## POSTAL_CODE32765 ***
## POSTAL_CODE92101 ***
## VEHICLE_YEARbefore 2015 ***
## GENDERmale ***
## ANNUAL_MILEAGE ***
## MARRIED1 ***
## DUIS .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 8147.2 on 6518 degrees of freedom
## Residual deviance: 4156.2 on 6506 degrees of freedom
## AIC: 4182.2
##
## Number of Fisher Scoring iterations: 15
## (Intercept) DRIVING_EXPERIENCE10-19y DRIVING_EXPERIENCE20-29y
## 0.11031458 0.11677312 0.01463941
## DRIVING_EXPERIENCE30y+ VEHICLE_OWNERSHIP POSTAL_CODE21217
## 0.00455180 0.13044385 1909370388.08812094
## POSTAL_CODE32765 POSTAL_CODE92101 VEHICLE_YEARbefore 2015
## 3.96591233 4.12778145 7.84671936
## GENDERmale ANNUAL_MILEAGE MARRIED1
## 2.71665946 1.00014678 0.73655467
## DUIS
## 1.18204946
Model Both
model_both <- step(object = model_claim_null,
direction = "both",
scope = list(upper = model_claim))## Start: AIC=8149.17
## OUTCOME ~ 1
##
## Df Deviance AIC
## + DRIVING_EXPERIENCE 3 6143.3 6151.3
## + AGE 3 6661.3 6669.3
## + INCOME 3 7041.3 7049.3
## + PAST_ACCIDENTS 1 7178.2 7182.2
## + VEHICLE_OWNERSHIP 1 7182.6 7186.6
## + SPEEDING_VIOLATIONS 1 7321.7 7325.7
## + CREDIT_SCORE 1 7475.4 7479.4
## + VEHICLE_YEAR 1 7490.7 7494.7
## + MARRIED 1 7711.1 7715.1
## + CHILDREN 1 7799.2 7803.2
## + DUIS 1 7848.8 7852.8
## + POSTAL_CODE 3 7855.5 7863.5
## + ANNUAL_MILEAGE 1 7904.9 7908.9
## + EDUCATION 2 7928.7 7934.7
## + GENDER 1 8084.9 8088.9
## + RACE 1 8144.8 8148.8
## <none> 8147.2 8149.2
## + VEHICLE_TYPE 1 8147.1 8151.1
##
## Step: AIC=6151.32
## OUTCOME ~ DRIVING_EXPERIENCE
##
## Df Deviance AIC
## + VEHICLE_OWNERSHIP 1 5457.7 5467.7
## + VEHICLE_YEAR 1 5638.8 5648.8
## + POSTAL_CODE 3 5694.3 5708.3
## + INCOME 3 5830.4 5844.4
## + CREDIT_SCORE 1 5950.2 5960.2
## + MARRIED 1 5997.4 6007.4
## + AGE 3 5993.5 6007.5
## + GENDER 1 6046.3 6056.3
## + ANNUAL_MILEAGE 1 6053.4 6063.4
## + EDUCATION 2 6057.2 6069.2
## + CHILDREN 1 6076.0 6086.0
## + SPEEDING_VIOLATIONS 1 6134.7 6144.7
## + PAST_ACCIDENTS 1 6135.2 6145.2
## <none> 6143.3 6151.3
## + DUIS 1 6141.8 6151.8
## + VEHICLE_TYPE 1 6143.2 6153.2
## + RACE 1 6143.3 6153.3
## - DRIVING_EXPERIENCE 3 8147.2 8149.2
##
## Step: AIC=5467.74
## OUTCOME ~ DRIVING_EXPERIENCE + VEHICLE_OWNERSHIP
##
## Df Deviance AIC
## + POSTAL_CODE 3 4938.3 4954.3
## + VEHICLE_YEAR 1 5046.9 5058.9
## + GENDER 1 5343.0 5355.0
## + MARRIED 1 5376.3 5388.3
## + INCOME 3 5382.1 5398.1
## + ANNUAL_MILEAGE 1 5394.7 5406.7
## + AGE 3 5397.4 5413.4
## + CREDIT_SCORE 1 5410.4 5422.4
## + CHILDREN 1 5411.4 5423.4
## + PAST_ACCIDENTS 1 5446.1 5458.1
## + SPEEDING_VIOLATIONS 1 5446.9 5458.9
## + EDUCATION 2 5445.0 5459.0
## + DUIS 1 5454.0 5466.0
## <none> 5457.7 5467.7
## + RACE 1 5457.3 5469.3
## + VEHICLE_TYPE 1 5457.7 5469.7
## - VEHICLE_OWNERSHIP 1 6143.3 6151.3
## - DRIVING_EXPERIENCE 3 7182.6 7186.6
##
## Step: AIC=4954.26
## OUTCOME ~ DRIVING_EXPERIENCE + VEHICLE_OWNERSHIP + POSTAL_CODE
##
## Df Deviance AIC
## + VEHICLE_YEAR 1 4463.7 4481.7
## + ANNUAL_MILEAGE 1 4789.0 4807.0
## + GENDER 1 4799.9 4817.9
## + MARRIED 1 4843.0 4861.0
## + INCOME 3 4851.0 4873.0
## + AGE 3 4873.9 4895.9
## + CHILDREN 1 4880.9 4898.9
## + CREDIT_SCORE 1 4883.1 4901.1
## + EDUCATION 2 4922.8 4942.8
## + DUIS 1 4931.4 4949.4
## + SPEEDING_VIOLATIONS 1 4935.3 4953.3
## <none> 4938.3 4954.3
## + PAST_ACCIDENTS 1 4938.0 4956.0
## + RACE 1 4938.2 4956.2
## + VEHICLE_TYPE 1 4938.2 4956.2
## - POSTAL_CODE 3 5457.7 5467.7
## - VEHICLE_OWNERSHIP 1 5694.3 5708.3
## - DRIVING_EXPERIENCE 3 6839.0 6849.0
##
## Step: AIC=4481.68
## OUTCOME ~ DRIVING_EXPERIENCE + VEHICLE_OWNERSHIP + POSTAL_CODE +
## VEHICLE_YEAR
##
## Df Deviance AIC
## + GENDER 1 4307.9 4327.9
## + ANNUAL_MILEAGE 1 4332.0 4352.0
## + MARRIED 1 4400.5 4420.5
## + CHILDREN 1 4420.1 4440.1
## + AGE 3 4447.9 4471.9
## + DUIS 1 4457.7 4477.7
## + CREDIT_SCORE 1 4458.5 4478.5
## + SPEEDING_VIOLATIONS 1 4459.0 4479.0
## <none> 4463.7 4481.7
## + INCOME 3 4457.7 4481.7
## + RACE 1 4463.0 4483.0
## + VEHICLE_TYPE 1 4463.6 4483.6
## + PAST_ACCIDENTS 1 4463.6 4483.6
## + EDUCATION 2 4463.2 4485.2
## - VEHICLE_YEAR 1 4938.3 4954.3
## - POSTAL_CODE 3 5046.9 5058.9
## - VEHICLE_OWNERSHIP 1 5127.6 5143.6
## - DRIVING_EXPERIENCE 3 6287.6 6299.6
##
## Step: AIC=4327.89
## OUTCOME ~ DRIVING_EXPERIENCE + VEHICLE_OWNERSHIP + POSTAL_CODE +
## VEHICLE_YEAR + GENDER
##
## Df Deviance AIC
## + ANNUAL_MILEAGE 1 4170.1 4192.1
## + MARRIED 1 4236.5 4258.5
## + CHILDREN 1 4263.2 4285.2
## + AGE 3 4290.5 4316.5
## + PAST_ACCIDENTS 1 4298.6 4320.6
## + INCOME 3 4299.0 4325.0
## + DUIS 1 4305.7 4327.7
## <none> 4307.9 4327.9
## + CREDIT_SCORE 1 4306.3 4328.3
## + RACE 1 4307.6 4329.6
## + VEHICLE_TYPE 1 4307.7 4329.7
## + SPEEDING_VIOLATIONS 1 4307.7 4329.7
## + EDUCATION 2 4306.2 4330.2
## - GENDER 1 4463.7 4481.7
## - VEHICLE_YEAR 1 4799.9 4817.9
## - POSTAL_CODE 3 4920.9 4934.9
## - VEHICLE_OWNERSHIP 1 4999.0 5017.0
## - DRIVING_EXPERIENCE 3 6183.3 6197.3
##
## Step: AIC=4192.06
## OUTCOME ~ DRIVING_EXPERIENCE + VEHICLE_OWNERSHIP + POSTAL_CODE +
## VEHICLE_YEAR + GENDER + ANNUAL_MILEAGE
##
## Df Deviance AIC
## + MARRIED 1 4158.9 4182.9
## + DUIS 1 4167.6 4191.6
## <none> 4170.1 4192.1
## + SPEEDING_VIOLATIONS 1 4168.2 4192.2
## + PAST_ACCIDENTS 1 4168.4 4192.4
## + CHILDREN 1 4168.5 4192.5
## + RACE 1 4169.8 4193.8
## + CREDIT_SCORE 1 4169.8 4193.8
## + VEHICLE_TYPE 1 4170.0 4194.0
## + AGE 3 4167.8 4195.8
## + EDUCATION 2 4169.9 4195.9
## + INCOME 3 4170.0 4198.0
## - ANNUAL_MILEAGE 1 4307.9 4327.9
## - GENDER 1 4332.0 4352.0
## - VEHICLE_YEAR 1 4643.2 4663.2
## - VEHICLE_OWNERSHIP 1 4841.0 4861.0
## - POSTAL_CODE 3 4869.2 4885.2
## - DRIVING_EXPERIENCE 3 5890.5 5906.5
##
## Step: AIC=4182.88
## OUTCOME ~ DRIVING_EXPERIENCE + VEHICLE_OWNERSHIP + POSTAL_CODE +
## VEHICLE_YEAR + GENDER + ANNUAL_MILEAGE + MARRIED
##
## Df Deviance AIC
## + DUIS 1 4156.2 4182.2
## <none> 4158.9 4182.9
## + SPEEDING_VIOLATIONS 1 4157.0 4183.0
## + PAST_ACCIDENTS 1 4157.2 4183.2
## + CREDIT_SCORE 1 4157.7 4183.7
## + CHILDREN 1 4157.7 4183.7
## + RACE 1 4158.5 4184.5
## + VEHICLE_TYPE 1 4158.9 4184.9
## + EDUCATION 2 4158.7 4186.7
## + AGE 3 4157.8 4187.8
## + INCOME 3 4158.3 4188.3
## - MARRIED 1 4170.1 4192.1
## - ANNUAL_MILEAGE 1 4236.5 4258.5
## - GENDER 1 4323.5 4345.5
## - VEHICLE_YEAR 1 4617.3 4639.3
## - VEHICLE_OWNERSHIP 1 4806.9 4828.9
## - POSTAL_CODE 3 4844.4 4862.4
## - DRIVING_EXPERIENCE 3 5812.5 5830.5
##
## Step: AIC=4182.18
## OUTCOME ~ DRIVING_EXPERIENCE + VEHICLE_OWNERSHIP + POSTAL_CODE +
## VEHICLE_YEAR + GENDER + ANNUAL_MILEAGE + MARRIED + DUIS
##
## Df Deviance AIC
## <none> 4156.2 4182.2
## + PAST_ACCIDENTS 1 4154.6 4182.6
## + SPEEDING_VIOLATIONS 1 4154.7 4182.7
## - DUIS 1 4158.9 4182.9
## + CREDIT_SCORE 1 4155.0 4183.0
## + CHILDREN 1 4155.1 4183.1
## + RACE 1 4155.8 4183.8
## + VEHICLE_TYPE 1 4156.2 4184.2
## + EDUCATION 2 4156.0 4186.0
## + AGE 3 4155.2 4187.2
## + INCOME 3 4155.6 4187.6
## - MARRIED 1 4167.6 4191.6
## - ANNUAL_MILEAGE 1 4233.7 4257.7
## - GENDER 1 4316.8 4340.8
## - VEHICLE_YEAR 1 4613.7 4637.7
## - VEHICLE_OWNERSHIP 1 4806.4 4830.4
## - POSTAL_CODE 3 4843.0 4863.0
## - DRIVING_EXPERIENCE 3 5592.6 5612.6
##
## Call:
## glm(formula = OUTCOME ~ DRIVING_EXPERIENCE + VEHICLE_OWNERSHIP +
## POSTAL_CODE + VEHICLE_YEAR + GENDER + ANNUAL_MILEAGE + MARRIED +
## DUIS, family = "binomial", data = data_train)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.20441920 0.26226432 -8.405 < 0.0000000000000002
## DRIVING_EXPERIENCE10-19y -2.14752235 0.09077430 -23.658 < 0.0000000000000002
## DRIVING_EXPERIENCE20-29y -4.22403836 0.17692269 -23.875 < 0.0000000000000002
## DRIVING_EXPERIENCE30y+ -5.39223245 0.43910017 -12.280 < 0.0000000000000002
## VEHICLE_OWNERSHIP -2.03681243 0.08606314 -23.666 < 0.0000000000000002
## POSTAL_CODE21217 21.37003938 178.13671900 0.120 0.904511
## POSTAL_CODE32765 1.37773592 0.10126934 13.605 < 0.0000000000000002
## POSTAL_CODE92101 1.41774008 0.18702383 7.581 0.0000000000000344
## VEHICLE_YEARbefore 2015 2.06009553 0.10758292 19.149 < 0.0000000000000002
## GENDERmale 0.99940299 0.08101164 12.337 < 0.0000000000000002
## ANNUAL_MILEAGE 0.00014677 0.00001691 8.679 < 0.0000000000000002
## MARRIED1 -0.30577182 0.09037675 -3.383 0.000716
## DUIS 0.16724977 0.10021863 1.669 0.095147
##
## (Intercept) ***
## DRIVING_EXPERIENCE10-19y ***
## DRIVING_EXPERIENCE20-29y ***
## DRIVING_EXPERIENCE30y+ ***
## VEHICLE_OWNERSHIP ***
## POSTAL_CODE21217
## POSTAL_CODE32765 ***
## POSTAL_CODE92101 ***
## VEHICLE_YEARbefore 2015 ***
## GENDERmale ***
## ANNUAL_MILEAGE ***
## MARRIED1 ***
## DUIS .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 8147.2 on 6518 degrees of freedom
## Residual deviance: 4156.2 on 6506 degrees of freedom
## AIC: 4182.2
##
## Number of Fisher Scoring iterations: 15
## (Intercept) DRIVING_EXPERIENCE10-19y DRIVING_EXPERIENCE20-29y
## 0.11031458 0.11677312 0.01463941
## DRIVING_EXPERIENCE30y+ VEHICLE_OWNERSHIP POSTAL_CODE21217
## 0.00455180 0.13044385 1909370388.08812094
## POSTAL_CODE32765 POSTAL_CODE92101 VEHICLE_YEARbefore 2015
## 3.96591233 4.12778145 7.84671936
## GENDERmale ANNUAL_MILEAGE MARRIED1
## 2.71665946 1.00014678 0.73655467
## DUIS
## 1.18204946
comparison <- compare_performance(model_claim,
model_claim_null,
model_backward,
model_forward,
model_both)
comparison💡 Insight:
Good Model Criteria:
* have greater value of adjusted R-squared
* have lower AIC score
* have lower RMSE score
* from
three point above, we can assume that model_backward ,
model_forward and model_both fulfill the
criteria given
* we choose model_backward to predict
our data_test
##
## Attaching package: 'gtools'
## The following object is masked from 'package:car':
##
## logit
log_claim <- predict(object = model_backward,
newdata = data_test,
type = "link")
p_claim <- inv.logit(log_claim)
claim_test_pred <- ifelse(test = p_claim >= 0.5,
yes = 1,
no = 0)
data_test$OUTCOME_PRED <- claim_test_pred
data_test %>%
select(OUTCOME,OUTCOME_PRED)confusionMatrix(data = as.factor(data_test$OUTCOME_PRED),
reference = as.factor(data_test$OUTCOME),
positive = "1")## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 1053 127
## 1 110 340
##
## Accuracy : 0.8546
## 95% CI : (0.8365, 0.8714)
## No Information Rate : 0.7135
## P-Value [Acc > NIR] : <0.0000000000000002
##
## Kappa : 0.6404
##
## Mcnemar's Test P-Value : 0.2987
##
## Sensitivity : 0.7281
## Specificity : 0.9054
## Pos Pred Value : 0.7556
## Neg Pred Value : 0.8924
## Prevalence : 0.2865
## Detection Rate : 0.2086
## Detection Prevalence : 0.2761
## Balanced Accuracy : 0.8167
##
## 'Positive' Class : 1
##
💡 Insight:
OUTCOME class with accuracy 85.46%RECALL
value because company would like to reduce the number of claim they
paidRECALL or SENSITIVITY value of this model
is 72.81%knn_set <- dataset %>%
select_if(is.numeric) %>%
select(-ID) %>%
mutate(OUTCOME = as.factor(OUTCOME)) %>%
drop_na()
table(knn_set$OUTCOME) %>%
prop.table()##
## 0 1
## 0.6887962 0.3112038
## Warning in RNGkind(sample.kind = "Rounding"): non-uniform 'Rounding' sampler
## used
set.seed(100)
# index sampling
index <- sample(x = nrow(knn_set),
size = 0.8*nrow(knn_set) ) # mau ambil 0.8 (80%) utk data train, sisanya utk data test
# splitting
knn_train <- knn_set[index,] # ambil yang barisnya termasuk di dalam index
knn_test <- knn_set[-index,] # ambil yang barisnya tidak termasuk di dalam index
# separating predictor and target
knn_train_x <- knn_train %>% select_if(is.numeric)
knn_test_x <- knn_test %>% select_if(is.numeric)
knn_train_y <- knn_train[,"OUTCOME"]
knn_test_y <- knn_test[, "OUTCOME"]
# scaling data train and test
knn_train_xs <- scale(x = knn_train_x)
knn_test_xs <- scale(x = knn_test_x,
center = attr(knn_train_xs,"scaled:center"),
scale = attr(knn_train_xs,"scaled:scale"))# finding k-optimum
k_opt <- sqrt(nrow(knn_set))
# prediction
knn_pred <- knn(train = knn_train_xs,
test = knn_test_xs,
cl = knn_train_y,
k = k_opt)
confusionMatrix(data = knn_pred,
reference = knn_test_y,
positive = "1")## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 992 209
## 1 148 281
##
## Accuracy : 0.781
## 95% CI : (0.7601, 0.8008)
## No Information Rate : 0.6994
## P-Value [Acc > NIR] : 0.00000000000008357
##
## Kappa : 0.46
##
## Mcnemar's Test P-Value : 0.001496
##
## Sensitivity : 0.5735
## Specificity : 0.8702
## Pos Pred Value : 0.6550
## Neg Pred Value : 0.8260
## Prevalence : 0.3006
## Detection Rate : 0.1724
## Detection Prevalence : 0.2632
## Balanced Accuracy : 0.7218
##
## 'Positive' Class : 1
##
💡 Insight:
OUTCOME class with accuracy 78.16%RECALL
value because company would like to reduce the number of claim they
paidRECALL or SENSITIVITY value of this model
is 57.14%| Metrics Performa | Log Reg | KNN |
|---|---|---|
| Accuracy | 0.8546 | 0.7816 |
| Recall (Sensitivity) | 0.7281 | 0.5714 |
| Precission (Pos Pred Value) | 0.7556 | 0.6573 |
| Specificity | 0.9054 | 0.8719 |
by this result, our Logistic Regression Model has better performance than KNN Model. This might be possible because Logistic Regression has more predictor rather than KNN (in KNN we do not use categorical predictor).
A work by Taufan Anggoro Adhi
tf.anggoro@gmail.com